logo
episode-header-image
Oct 27
1h 5m

Beyond the Perimeter: Practical Patterns...

Tobias Macey
About this episode
Summary
In this episode of the Data Engineering Podcast Matt Topper, president of UberEther, talks about the complex challenge of identity, credentials, and access control in modern data platforms. With the shift to composable ecosystems, integration burdens have exploded, fracturing governance and auditability across warehouses, lakes, files, vector stores, and streaming systems. Matt shares practical solutions, including propagating user identity via JWTs, externalizing policy with engines like OPA/Rego and Cedar, and using database proxies for native row/column security. He also explores catalog-driven governance, lineage-based label propagation, and OpenTDF for binding policies to data objects. The conversation covers machine-to-machine access, short-lived credentials, workload identity, and constraining access by interface choke points, as well as lessons from Zanzibar-style policy models and the human side of enforcement. Matt emphasizes the need for trust composition - unifying provenance, policy, and identity context - to answer questions about data access, usage, and intent across the entire data path.

Announcements
  • Hello and welcome to the Data Engineering Podcast, the show about modern data management
  • Data teams everywhere face the same problem: they're forcing ML models, streaming data, and real-time processing through orchestration tools built for simple ETL. The result? Inflexible infrastructure that can't adapt to different workloads. That's why Cash App and Cisco rely on Prefect. Cash App's fraud detection team got what they needed - flexible compute options, isolated environments for custom packages, and seamless data exchange between workflows. Each model runs on the right infrastructure, whether that's high-memory machines or distributed compute. Orchestration is the foundation that determines whether your data team ships or struggles. ETL, ML model training, AI Engineering, Streaming - Prefect runs it all from ingestion to activation in one platform. Whoop and 1Password also trust Prefect for their data operations. If these industry leaders use Prefect for critical workflows, see what it can do for you at dataengineeringpodcast.com/prefect.
  • Data migrations are brutal. They drag on for months—sometimes years—burning through resources and crushing team morale. Datafold's AI-powered Migration Agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches. And they're so confident in their solution, they'll actually guarantee your timeline in writing. Ready to turn your year-long migration into weeks? Visit dataengineeringpodcast.com/datafold today for the details.
  • Composable data infrastructure is great, until you spend all of your time gluing it together. Bruin is an open source framework, driven from the command line, that makes integration a breeze. Write Python and SQL to handle the business logic, and let Bruin handle the heavy lifting of data movement, lineage tracking, data quality monitoring, and governance enforcement. Bruin allows you to build end-to-end data workflows using AI, has connectors for hundreds of platforms, and helps data teams deliver faster. Teams that use Bruin need less engineering effort to process data and benefit from a fully integrated data platform. Go to dataengineeringpodcast.com/bruin today to get started. And for dbt Cloud customers, they'll give you $1,000 credit to migrate to Bruin Cloud.
  • Your host is Tobias Macey and today I'm interviewing Matt Topper about the challenges of managing identity and access controls in the context of data systems
Interview
  • Introduction
  • How did you get involved in the area of data management?
  • The data ecosystem is a uniquely challenging space for creating and enforcing technical controls for identity and access control. What are the key considerations for designing a strategy for addressing those challenges?
  • For data acess the off-the-shelf options are typically on either extreme of too coarse or too granular in their capabilities. What do you see as the major factors that contribute to that situation?
  • Data governance policies are often used as the primary means of identifying what data can be accesssed by whom, but translating that into enforceable constraints is often left as a secondary exercise. How can we as an industry make that a more manageable and sustainable practice?
  • How can the audit trails that are generated by data systems be used to inform the technical controls for identity and access?
  • How can the foundational technologies of our data platforms be improved to make identity and authz a more composable primitive?
  • How does the introduction of streaming/real-time data ingest and delivery complicate the challenges of security controls?
  • What are the most interesting, innovative, or unexpected ways that you have seen data teams address ICAM?
  • What are the most interesting, unexpected, or challenging lessons that you have learned while working on ICAM?
  • What are the aspects of ICAM in data systems that you are paying close attention to?
    • What are your predictions for the industry adoption or enforcement of those controls?
Contact Info
Parting Question
  • From your perspective, what is the biggest gap in the tooling or technology for data management today?
Closing Announcements
  • Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The AI Engineering Podcast is your guide to the fast-moving world of building AI systems.
  • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
  • If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com with your story.
Links
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Up next
Nov 16
State, Scale, and Signals: Rethinking Orchestration with Durable Execution
Summary&nbsp;<br />In this episode Preeti Somal, EVP of Engineering at Temporal, talks about the durable execution model and how it reshapes the way teams build reliable, stateful systems for data and AI. She explores Temporal’s code‑first programming model—workflows, activities, ... Show More
51m 46s
Nov 9
The AI Data Paradox: High Trust in Models, Low Trust in Data
Summary<br />In this episode of the Data Engineering Podcast Ariel Pohoryles, head of product marketing for Boomi's data management offerings, talks about a recent survey of 300 data leaders on how organizations are investing in data to scale AI. He shares a paradox uncovered in ... Show More
51m 35s
Nov 2
Bridging the AI–Data Gap: Collect, Curate, Serve
SummaryIn this episode of the Data Engineering Podcast Omri Lifshitz (CTO) and Ido Bronstein (CEO) of Upriver talk about the growing gap between AI's demand for high-quality data and organizations' current data practices. They discuss why AI accelerates both the supply and demand ... Show More
50m 40s
Recommended Episodes
Apr 2025
Specialized AI brains for physical industry
Everyone wants a piece of general purpose models. Instacart has deployed ChatGPT for recipes and meal planning. The Mayo Clinic is using it to summarize patient records. Schneider Electric is using an OpenAI LLM to generate sustainability reports. With such powerful models, what’ ... Show More
37m 2s
Oct 14
The connected world of energy | Special episode from Wood Mackenzie
<p>Host Ed Crooks talks to Jason Liu, Chief Executive of Wood Mackenzie and co-author (with Chief Analyst Simon Flowers) of a new book, Connected, about the fast-changing world of energy. They are also joined by Sunaina Ocalan, formerly Senior Director for Corporate Strategy &amp ... Show More
44m 5s
Sep 18
How People Actually Use ChatGPT
This episode of AI Daily Brief dives into two important reports on how people are really using AI tools like ChatGPT and Claude. OpenAI’s massive study with Harvard and NBER reveals consumer patterns across 1.5 million conversations, while Anthropic’s Economic Index tracks broade ... Show More
27m 39s
Mar 2025
#295 How To Get Hired As A Data Or AI Engineer with Deepak Goyal, CEO & Founder at Azurelib Academy
The role of data and AI engineers is more critical than ever. With organizations collecting massive amounts of data, the challenge lies in building efficient data infrastructures that can support AI systems and deliver actionable insights. But what does it take to become a succes ... Show More
52m 27s
Nov 2024
Model Plateaus and Enterprise AI Adoption with Cohere's Aidan Gomez
In this episode of No Priors, Sarah is joined by Aidan Gomez, cofounder and CEO of Cohere. Aidan reflects on his journey to co-authoring the groundbreaking 2017 paper, “Attention is All You Need,” during his internship, and shares his motivations for building Cohere, which delive ... Show More
44m 15s
Jul 2022
IoT, IIoT and Managing Edge Data
<p>Brian Gilmore (@BrianMGilmore, Director IoT/Emerging Technology @InfluxDB) talks about Edge and Industrial Edge Computing, as well as application and data challenges at the edge.</p><p><b>SHOW: 634</b></p><p><b>CLOUD NEWS OF THE WEEK - </b><a href='http://bit.ly/cloudcast-cnot ... Show More
35m 37s
Aug 15
Measuring AI code assistants and agents with the AI Measurement Framework
In this episode of Engineering Enablement, DX CTO Laura Tacho and CEO Abi Noda break down how to measure developer productivity in the age of AI using DX’s AI Measurement Framework. Drawing on research with industry leaders, vendors, and hundreds of organizations, they explain ho ... Show More
41m 14s
Jan 2025
3164: Breaking Data Silos: How Hammerspace is Powering AI Storage and Hybrid Cloud
<p>As part of the IT Press Tour in Silicon Valley, I had the opportunity to sit down with David Flynn, CEO of Hammerspace, to explore how the company is redefining the future of enterprise data storage.</p> <p>At a time when AI-driven workloads and hybrid cloud computing are push ... Show More
24m 26s
Sep 15
#321 Developing Financial AI Products at Experian with Vijay Mehta, EVP of Global Solutions & Analytics at Experian
Financial institutions are racing to harness the power of AI, but the path to implementation is filled with challenges. From feature engineering to model deployment, the technical complexities of AI adoption in finance require careful navigation of both technological and regulato ... Show More
49m 28s
Jan 2024
Beyond the DORA metrics: Measuring engineering excellence
<p>Is it really possible to measure the impact engineering teams have on a business' success? At a time when growth is challenging for many organizations and questions about productivity and effectiveness dominate industry conversations, getting it right is crucial. And although ... Show More
35m 31s