logo
episode-header-image
Sep 2018
16m 22s

Data Engineering

Ben Jaffe And Katie Malone
About this episode
If you’re a data scientist, you know how important it is to keep your data orderly, clean, moving smoothly between different systems, well-documented… there’s a ton of work that goes into building and maintaining databases and data pipelines. This job, that of owner and maintainer of the data being used for analytics, is often the realm of data engineers. Fr ... Show More
Up next
Yesterday
It's RAG time: Retrieval-Augmented Generation
Today we are going to talk about the feature with the worst acronym in generative AI: RAG, or Retrieval Augmented Generation. If you've ever used something like "Chat with My Docs," if you have an internal AI chatbot that has access to your company's documents, or you've created ... Show More
17m 14s
Feb 23
Chasing Away Repetitive LLM Responses with Verbalized Sampling
One of the things that LLMs can be really helpful with is brainstorming or generating new creative content. They are called Generative AI, after all—not just for summarization and question-and-answer tasks. But if you use LLMs for creative generation, you may find that their outp ... Show More
19m 12s
Feb 16
We're Back
It's been (*checks watch*) about five and a half years since we last talked. Fortunately nothing much has happened in the AI/data science world in that time. So let's just pick up where we left off, shall we? 
2m 58s
Recommended Episodes
Apr 2021
Moving Machine Learning Into The Data Pipeline at Cherre
<div class="wp-block-jetpack-markdown"><h2>Summary</h2> <p>Most of the time when you think about a data pipeline or ETL job what comes to mind is a purely mechanistic progression of functions that move data from point A to point B. Sometimes, however, one of those transformation ... Show More
48m 5s
Nov 2023
#162 Scaling Data Engineering in Retail with Mohammad Sabah, SVP of Engineering & Data at Thrive Market
Poor data engineering is like building a shaky foundation for a house—it leads to unreliable information, wasted time and money, and even legal problems, making everything less dependable and more troublesome in our digital world. In the retail industry specifically, data enginee ... Show More
51m 39s
Aug 2023
Unpacking The Seven Principles Of Modern Data Pipelines
<h2>Summary</h2> <p>Data pipelines are the core of every data product, ML model, and business intelligence dashboard. If you&#39;re not careful you will end up spending all of your time on maintenance and fire-fighting. The folks at Rivery distilled the seven principles of mod ... Show More
47m 3s
Nov 2021
Data Quality Starts At The Source
<div class="wp-block-jetpack-markdown"><h2>Summary</h2> <p>The most important gauge of success for a data platform is the level of trust in the accuracy of the information that it provides. In order to build and maintain that trust it is necessary to invest in defining, monitori ... Show More
58m 55s
Feb 2020
Data Modeling That Evolves With Your Business Using Data Vault
<div class="wp-block-jetpack-markdown"><h2>Summary</h2> <p>Designing the structure for your data warehouse is a complex and challenging process. As businesses deal with a growing number of sources and types of information that they need to integrate, they need a data modeling st ... Show More
1h 6m
Jun 2020
Bringing Business Analytics To End Users With GoodData
<div class="wp-block-jetpack-markdown"><h2>Summary</h2> <p>The majority of analytics platforms are focused on use internal to an organization by business stakeholders. As the availability of data increases and overall literacy in how to interpret it and take action improves ther ... Show More
52m 24s
Apr 2022
#83 Empowering the Modern Data Analyst
As data volumes grow and become ever-more complex, the role of the data analyst has never been more important. At the disposal of the modern data analyst, are tools that reduce time to insight, and increase collaboration. However, as the tools of a data analyst evolve, so do the ... Show More
37m 1s