logo
episode-header-image
Sep 2018
16m 22s

Data Engineering

Ben Jaffe And Katie Malone
About this episode
If you’re a data scientist, you know how important it is to keep your data orderly, clean, moving smoothly between different systems, well-documented… there’s a ton of work that goes into building and maintaining databases and data pipelines. This job, that of owner and maintainer of the data being used for analytics, is often the realm of data engineers. Fr ... Show More
Up next
Apr 13
Unfaithful Chain of Thought
What's actually happening when an LLM "thinks out loud"? Research on human decision-making suggests that much of the reasoning we believe drives our choices is actually post hoc rationalization — we decide first, explain later. Katie and Ben get curious about whether the same mig ... Show More
24m 32s
Apr 6
Benchmark Bank Heist
What if an AI decided the smartest way to pass its test was to find the answer key? That's exactly what Anthropic's Claude Opus did when faced with a benchmark evaluation — reasoning that it was being tested, tracking down the encrypted eval dataset, decrypting it, and returning ... Show More
12m 36s
Mar 30
Benchmarking AI Models
How do you know if a new AI model is actually better than the last one? It turns out answering that question is a lot messier than it sounds. This week we dig into the world of LLM benchmarks — the standardized tests used to compare models — exploring two canonical examples: MMLU ... Show More
29m 55s
Recommended Episodes
Apr 2021
Moving Machine Learning Into The Data Pipeline at Cherre
<div class="wp-block-jetpack-markdown"><h2>Summary</h2> <p>Most of the time when you think about a data pipeline or ETL job what comes to mind is a purely mechanistic progression of functions that move data from point A to point B. Sometimes, however, one of those transformation ... Show More
48m 5s
Nov 2023
#162 Scaling Data Engineering in Retail with Mohammad Sabah, SVP of Engineering & Data at Thrive Market
Poor data engineering is like building a shaky foundation for a house—it leads to unreliable information, wasted time and money, and even legal problems, making everything less dependable and more troublesome in our digital world. In the retail industry specifically, data enginee ... Show More
51m 39s
Aug 2023
Unpacking The Seven Principles Of Modern Data Pipelines
<h2>Summary</h2> <p>Data pipelines are the core of every data product, ML model, and business intelligence dashboard. If you&#39;re not careful you will end up spending all of your time on maintenance and fire-fighting. The folks at Rivery distilled the seven principles of mod ... Show More
47m 3s
Nov 2021
Data Quality Starts At The Source
<div class="wp-block-jetpack-markdown"><h2>Summary</h2> <p>The most important gauge of success for a data platform is the level of trust in the accuracy of the information that it provides. In order to build and maintain that trust it is necessary to invest in defining, monitori ... Show More
58m 55s
Feb 2020
Data Modeling That Evolves With Your Business Using Data Vault
<div class="wp-block-jetpack-markdown"><h2>Summary</h2> <p>Designing the structure for your data warehouse is a complex and challenging process. As businesses deal with a growing number of sources and types of information that they need to integrate, they need a data modeling st ... Show More
1h 6m
Jun 2020
Bringing Business Analytics To End Users With GoodData
<div class="wp-block-jetpack-markdown"><h2>Summary</h2> <p>The majority of analytics platforms are focused on use internal to an organization by business stakeholders. As the availability of data increases and overall literacy in how to interpret it and take action improves ther ... Show More
52m 24s
Apr 2022
#83 Empowering the Modern Data Analyst
As data volumes grow and become ever-more complex, the role of the data analyst has never been more important. At the disposal of the modern data analyst, are tools that reduce time to insight, and increase collaboration. However, as the tools of a data analyst evolve, so do the ... Show More
37m 1s