logo
episode-header-image
Apr 2022
39m 28s

Apache Beam with Kenneth Knowles and Pab...

GOOGLE CLOUD PLATFORM
About this episode

On the podcast this week, your hosts Stephanie Wong and Mark Mirchandani talk about the data processing tool Apache Beam with guests Pablo Estrada and Kenneth Knowles.

Kenn starts us off with an overview of how Apache Beam began and how Cloud Dataflow was involved. The unique batch and stream method and emphasis on correctness garnered support from developers early on and continues to attract users. Pablo helps us understand why Beam is a better option for certain projects looking to process large amounts of data. Our guests describe how Beam may be a better fit than microservices that could become obsolete as company needs change.

Next, we step back and take a look at why batch and stream is the gold standard of data processing because of its balance between low latency and ease of “being done” with data collection. Beam’s focus on the correctness of data and correctness in processing that data is a core component. With good data, processing becomes easier, more reliable, and cheaper. Kenn gives examples of how things can go wrong with bad data processing. Beam strives for the perfect combination of low latency, correct data, and affordability. Users can choose where to run Beam pipelines, from other Apache software offerings to Dataflow, which means excellent flexibility. Our guests talk about the pros and cons of some of these options and we hear examples of how companies are using Beam along with supporting software to solve data processing challenges.

To get started with Beam, check out Beam College or attend Beam Summit 2022.

Kenneth Knowles

Kenn Knowles is chair of the Apache Beam Project Management Committee. Kenn has been working on Google Cloud Dataflow—Google’s Beam backend—since 2014. Kenn holds a PhD in programming languages from the University of California, Santa Cruz.

Pablo Estrada

Pablo is a Software Engineer at Google, and a management committee member for Apache Beam. Pablo is big into working on an open source project, and has worked all across the Apache Beam stack.

Cool things of the week
  • Under the sea: Building the world’s fiber optic internet video
  • Google Data Cloud Summit site
  • It’s official—Google Distributed Cloud Edge is generally available blog
    • GCP Podcast Episode 228: Fastly with Tyler McMullen podcast
  • Save big by temporarily suspending unneeded Compute Engine VMs—now GA blog
Interview
  • Apache Beam site
  • Apache Beam Documentation site
  • Dataflow site
  • Apache Flink site
  • Apache Spark site
  • Apache Samza site
  • Apache Nemo site
  • Spanner site
  • BigQuery site
  • Beam College site
  • Beam College on Github site
  • Beam Developer Mailing List email
  • Beam User Mailing List email
  • Beam Summit site
What’s something cool you’re working on?

Mark is working on a new Apache Beam video series Getting Started Wtih Apache Beam

Hosts

Stephanie Wong and Mark Mirchandani

Up next
Nov 2023
How UniSuper is helping Australians get the best of their superannuation fund investments with cloud
In this special episode, we are featuring That Digital Show. In Australia, every employee is required to select their superannuation fund of choice to help them invest a portion of their income. Having celebrated its 40th anniversary recently, UniSuper, one of Australia’s largest ... Show More
25m 34s
Aug 2023
Creating a sustainable EV ecosystem in Taiwan with ChargeSmith
In this special episode, we are featuring That Digital Show. As the electric vehicles (EV) sector accelerates, drivers are finding it a challenge to conveniently access charging points. This has become one of the biggest concerns for EV drivers around the world. Intending to solv ... Show More
26m 42s
Jul 2023
Tapping onto AI to build a more sustainable future with Recursive AI
In this special episode, we are featuring That Digital Show. AI is seen as a powerful tool and enabler for businesses around the world. At the same time, more organizations are looking for ways to operate more sustainably. To combine the two, Recursive AI was established in 2020, ... Show More
25m 53s
Recommended Episodes
Jan 2021
CNCF and the Linux Foundation, with Chris Aniszcyzk
After building the Eclipse IDE and Twitter’s Open Source office, Chris Aniszcyzk bootstrapped the CNCF, joining its parent the Linux Foundation in 2015. He’s now a VP of DevRel there, as well as CTO at the CNCF and Executive Director of the Open Container Initiative. Chris joins ... Show More
38m 40s
Nov 2022
Kubernetes on Vessels, with Louis Bailleul
Louis Bailleul is a Chief Enterprise Architect at PGS. After years of running highly-ranked super computers to process PGS’ seismic data, Louis’s team at PGS has lead a transition to Google Cloud. Listen in to learn about HPC in Google Cloud with GKE, and to explore using Kuberne ... Show More
42m 56s
Nov 2022
KubeCon NA 2022
In this episode we bring you with us to KubeCon NA 2022 in Detroit, Michigan. We interviewed 15 attendees from various backgrounds and learned some cool insights. Featuring: Mo Khan, Software Engineer, Microsoft. Katrina Verey, Senior Staff Production Engineer, Shopify. Aishwarya ... Show More
45m 9s
Apr 2020
Kubernetes Community Redux, with Paris Pittman
To celebrate our 100th episode we welcome back our first ever guest, Paris Pittman, open source program manager at Google Cloud and member of the Kubernetes steering committee - among many other roles. Along with hosts Adam and Craig, Paris looks at how the community has changed ... Show More
43m 19s
Jul 2022
Writing, Learning and Tech, with Ian Miell
Ian Miell is a partner at consultancy Container Solutions, and an author of books on Bash, Git, Terraform and Docker. He explains to Craig how writing - whether runbooks, blog posts, training courses, or “real” books, can help you learn and make your team more effective. Do you h ... Show More
45m 38s
Jan 2025
Linkerd, with William Morgan
William Morgan is the CEO of Buoyant, the company behind Linkerd. You worked at Twitter before as a software engineer and engineering manager and you have a long experience in the field. Do you have something cool to share? Some questions? Let us know: - web: kubernetespodcast.co ... Show More
53m 6s
Jul 2022
Mercedes-Benz Tech Innovation, with Sabine Wolz
Why does a car manufacturer own an IT company? How did that IT company end up running 900 Kubernetes clusters, starting at version 0.9? Craig asks these questions and more of Sabine Wolz, Product Manager at Mercedes-Benz Tech Innovation. Do you have something cool to share? Some ... Show More
36m 3s
Sep 2023
History of containerd, with Phil Estes
This week we explore the history of containers, particularly containerd, with Phil Estes.   Do you have something cool to share? Some questions? Let us know: - web: kubernetespodcast.com - mail: kubernetespodcast@google.com - twitter: @kubernetespod   News of the week Notary Proj ... Show More
59m 20s
Jan 2024
2782: Demystifying Cloud Innovation: A Journey from GoDaddy to the Future of Cloud Computing
Have you ever wondered about the masterminds shaping the cloud computing landscape? In today's episode of Tech Talks Daily Podcast, we dive into the world of cloud innovation with a special guest, Darren Shepherd, the Chief Architect and co-founder of Acorn Labs. Darren's journey ... Show More
22m 10s
Jan 2020
OpenShift and Kubernetes, with Clayton Coleman
Five years ago, Clayton Coleman took a bet on a new open source project that Google was about to announce. He became the first external contributor to Kubernetes, and the architect of Red Hat’s reinvention of OpenShift from PaaS to “enterprise Kubernetes”. Hosts Adam Glick and Cr ... Show More
47m 9s