Summary
In this episode Jevin Maltais talks about the practical realities of building reliable, product-focused streaming systems with Kafka. Jevin shares lessons from roles at Zapier, Humi, and Clio, where real-time synchronization, customer data unification, and document sync at scale highlighted both the strengths and common misuses of Kafka. He digs into using events as the source of truth, materialized views with KTables, and how schema registries and type safety prevent downstream breakage. Jevin explains why teams often reach for heavyweight Kafka clusters without leveraging Streams, Connect, or interactive queries—and how his project, TypeStream, aims to make those capabilities accessible via config-as-code while keeping a thin abstraction and clear escape hatches. He also explore trade-offs across Kafka-compatible alternatives, CDC with Debezium in the real world, and where abstractions should stop so teams can scale responsibility as complexity grows.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management
This episode is sponsored by DataDriven.io, the free data engineering interview prep platform built by data engineers for data engineers. Ever walked into a data engineering interview and gotten a question that has nothing to do with real data engineering work? Interviewing is its own skill, separate from the job. Watch your code execute live, inspect Spark internals, and whiteboard your data models and pipelines and defend your decisions. Unlike SQL-only or Python-only practice, DataDriven.io covers the full interview loop: star schemas, slowly changing dimensions, grain and fact table design, idempotency, watermarks, dead letter queues, change data capture, and backpressure. Every question comes from real Data Engineer interview loops at Google, Amazon, Meta, Stripe, Databricks, Netflix, and Airbnb. Go to dataengineeringpodcast.com/datadriven today to start practicing.
Your host is Tobias Macey and today I'm interviewing Jevin Maltais about the challenges of building a reliable streaming

Interview

Introduction
How did you get involved in the area of data management?
Can you describe what Typestream is and the story behind it?
What are the common challenges that teams encounter when trying to build on top of Kafka?
How do those challenges/misconfigurations impact the team's ability to deliver on product goals?
What are the fundamental design aspects of Kafka that contribute to the difficulties that teams encounter when using it as an element of their architecture?
There have been numerous projects taking aim at Kafka, with varying approaches and degrees of effectiveness (e.g. RedPanda, AutoMQ, Pulsar, etc.). What are the tradeoffs that each of those approaches requires?
What makes the original Kafka project so resilient in the face of all of that competition?
Can you describe the architecture of Typestream and how each of the core elements contribute to a better user experience?
For teams who want to take advantage of streaming capabilities, but don't want to invest in becoming Kafka experts, what does the Typestream workflow look like?
If they don't want to manage the operational overhead of a Kafka cluster, how tightly coupled is Typestream to the original Kafka? (can someone use RedPanda or AutoMQ instead?)
What are the most interesting, innovative, or unexpected ways that you have seen Typestream used?
What are the most interesting, unexpected, or challenging lessons that you have learned while working on Typestream?
When is Typestream the wrong choice?
What do you have planned for the future of Typestream?

Contact Info

Website

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.
Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com with your story.

Links

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Hello and welcome to the Data Engineering Podcast, the show about modern data management
This episode is sponsored by DataDriven.io, the free data engineering interview prep platform built by data engineers for data engineers. Ever walked into a data engineering interview and gotten a question that has nothing to do with real data engineering work? Interviewing is its own skill, separate from the job. Watch your code execute live, inspect Spark internals, and whiteboard your data models and pipelines and defend your decisions. Unlike SQL-only or Python-only practice, DataDriven.io covers the full interview loop: star schemas, slowly changing dimensions, grain and fact table design, idempotency, watermarks, dead letter queues, change data capture, and backpressure. Every question comes from real Data Engineer interview loops at Google, Amazon, Meta, Stripe, Databricks, Netflix, and Airbnb. Go to dataengineeringpodcast.com/datadriven today to start practicing.
Your host is Tobias Macey and today I'm interviewing Jevin Maltais about the challenges of building a reliable streaming