logo
episode-header-image
Jul 30
26m 27s

Improving AI Through Data Quality

Massive Studios
About this episode

Elliot Shmukler (@eshmu, Co-Founder/CEO @anomalo_hq) talks about the impact of data quality on AI, how unstructured data can be improved, and how monitoring of data lakes can help prevent model drift and give organizations confidence with predictable results.

SHOW: 945

SHOW TRANSCRIPT: The Cloudcast #945 Transcript

SHOW VIDEO: https://youtube.com/@TheCloudcastNET 

CLOUD NEWS OF THE WEEK:  http://bit.ly/cloudcast-cnotw

NEW TO CLOUD? CHECK OUT OUR OTHER PODCAST:  "CLOUDCAST BASICS"

SPONSORS:

  • [DoIT] Visit doit.com (that’s d-o-i-t.com) to unlock intent-aware FinOps at scale with DoiT Cloud Intelligence.
  • [VASION] Vasion Print eliminates the need for print servers by enabling secure, cloud-based printing from any device, anywhere. Get a custom demo to see the difference for yourself.
  • [FCTR] Try FCTR.io (that's F-C-T-R dot io) free for 60 days. Modern security demands modern solutions. Check out Fctr's Tako AI, the first AI agent for Okta, on their website

SHOW NOTES:

Topic 1 - Elliot, welcome back! It’s hard to believe it has been 3 years since we spoke! Give everyone a brief introduction.

Topic 2 - Here’s the problem I see when it comes to AI adoption today. There isn’t an “off the shelf” AI model with an organization's data built in; that’s impossible. So, you must bring this data, often unstructured, to the model, often with mixed results. Do you agree?

Topic 3 - I see data quality in two ways… the quality of the data before ingestion is one way, we want the data to be clean going in. But, we also need a way to detect, mitigate, and do a root cause analysis for quality checks along the way, correct? Give everyone an idea of what this life cycle looks like.

Topic 4 - What are you seeing as the barriers to adoption? Is it the tools, the models, the need for RAG pipelines, the lack of data scientists, and AIOps?

Topic 5 - We have this crossroads where proprietary data makes an organization unique, but exposing that unique data puts the organization at risk. How much of a factor does this play, and how do you advise organizations around this complex intersection

Topic 6 - There is always this concept of predictable results. This answer should be consistent and repeatable. We’ve seen things like model/data drift and hallucinations hinder this concept, leading to a lack of confidence in the results. How do you advise organizations to tackle this lifecycle management and predictability over time?


FEEDBACK?

Up next
Today
The 5-10-85 Reality of Enterprise AI
Three years since the launch of ChatGPT, what does the landscape of Enterprise AI look like today? What’s working, what’s struggling and what’s still unknown?SHOW: 966SHOW TRANSCRIPT: The Cloudcast #966 TranscriptSHOW VIDEO: https://youtube.com/@TheCloudcastNET CLOUD NEWS OF THE ... Show More
27m 49s
Oct 8
Using AI Reasoning to Prevent AI Scams
Alan Lefort (CEO, @StrongestLayer) discusses how LLM-powered reasoning is transforming phishing security from reactive pattern-matching to predictive threat detection, and why traditional rule-based systems can no longer defend against sophisticated AI-generated phishing attacks. ... Show More
34 m
Oct 5
Will Cloud Providers start acquiring SaaS?
As cloud matures, could the hyperscale cloud providers start looking to acquire SaaS providers to build out a bundled application portfolio? Or are the demands of AI investment too much to pursue that strategy? SHOW: 964SHOW TRANSCRIPT: The Cloudcast #964 TranscriptSHOW VIDEO: ht ... Show More
28m 16s
Recommended Episodes
Apr 2025
Simplifying Data Pipelines with Durable Execution
Summary In this episode of the Data Engineering Podcast Jeremy Edberg, CEO of DBOS, about durable execution and its impact on designing and implementing business logic for data systems. Jeremy explains how DBOS's serverless platform and orchestrator provide local resilience and r ... Show More
39m 49s
Jul 2022
Writing, Learning and Tech, with Ian Miell
Ian Miell is a partner at consultancy Container Solutions, and an author of books on Bash, Git, Terraform and Docker. He explains to Craig how writing - whether runbooks, blog posts, training courses, or “real” books, can help you learn and make your team more effective. Do you h ... Show More
45m 38s
Sep 18
From RAG to Relational: How Agentic Patterns Are Reshaping Data Architecture
SummaryIn this episode of the AI Engineering Podcast Mark Brooker, VP and Distinguished Engineer at AWS, talks about how agentic workflows are transforming database usage and infrastructure design. He discusses the evolving role of data in AI systems, from traditional models to m ... Show More
52m 58s
Nov 2022
Kubernetes on Vessels, with Louis Bailleul
Louis Bailleul is a Chief Enterprise Architect at PGS. After years of running highly-ranked super computers to process PGS’ seismic data, Louis’s team at PGS has lead a transition to Google Cloud. Listen in to learn about HPC in Google Cloud with GKE, and to explore using Kuberne ... Show More
42m 56s
Feb 2025
Troubleshooting Microservices with Julia Blase
A distributed system is a network of independent services that work together to achieve a common goal. Unlike a monolithic system, a distributed system has no central point of control, meaning it must handle challenges like data consistency, network latency, and system failures. ... Show More
43 m
Jun 2022
Configuration as Data, with Justin Santa Barbara
What is configuration as data, how is different from infrastructure as code, and why can’t anything just be itself anymore? We posed these questions and more to long-time Kubernetes contributor Justin Santa Barbara at KubeCon EU, and this episode is the result. Justin created the ... Show More
50m 49s
Aug 4
#732: How to gain Multi-Cluster Visibility across Kubernetes Clusters with the EKS Dashboard
In this episode, we'll explore how the new Amazon EKS Dashboard solves key challenges in managing Kubernetes at scale across multiple AWS accounts and regions. We'll discuss how it provides centralized visibility into cluster health, versions, and costs - enabling teams to improv ... Show More
24m 53s
Aug 12
Podman with Brent Baude
Podman is an open-source container management tool that allows developers to build, run, and manage containers. Unlike Docker, it supports rootless containers for improved security and is fully compatible with standards from the Open Container Initiative, or OCI. Brent Baude is a ... Show More
43m 24s
Dec 2022
Kubernetes v1.26 Electrifying, with Leonard Pahlke
Leonard Pahlke is not only the Release Lead for Kubernetes v1.26, he's also a co-chair of the CNCF TAG for Environmental Sustainability and a student working toward a Master's Degree in Computer Science at the Hamburg University of Applied Sciences. In this episode, Leonard talks ... Show More
31m 42s
Jun 2025
Vibe Coding vs Low-Code/No-Code: Security Risks and CI/CD Pipeline Impacts for Citizen Developers
Explore the evolution from traditional coding to vibe coding and its relationship with low-code/no-code (LCNC) platforms. This comprehensive analysis examines how AI-assisted development and visual programming tools are creating a new generation of citizen developers, transformin ... Show More
9m 42s