logo
episode-header-image
Aug 20
25m 11s

Kubernetes in the Era of GPUs

Massive Studios
About this episode

Haseeb Budhani (@haseebbudhani, CEO @rafaysystemsinc) discusses the evolution from traditional DevOps to platform engineering and what "Enterprise Ready" Kubernetes looks like in 2025. We explore AI workloads running on Kubernetes and how modern orchestration solutions can transform teams from bottlenecks into enablers. We also cover the security considerations for GPU-enabled AI workloads and balancing developer self-service capabilities with proper governance and control.

SHOW: 950

SHOW TRANSCRIPT: The Cloudcast #950 Transcript

SHOW VIDEO: https://youtube.com/@TheCloudcastNET 

NEW TO CLOUD? CHECK OUT OUR OTHER PODCAST: "CLOUDCAST BASICS"

SPONSORS:

  • [DoIT] Visit doit.com (that’s d-o-i-t.com) to unlock intent-aware FinOps at scale with DoiT Cloud Intelligence.
  • [VASION] Vasion Print eliminates the need for print servers by enabling secure, cloud-based printing from any device, anywhere. Get a custom demo to see the difference for yourself.

SHOW NOTES:

Topic 1 - Welcome to the show, Haseeb. Give everyone a quick introduction.

Topic 2 - Let’s start by talking about the evolution of Kubernetes as a platform. You’ve said and we’ve talked about on this show for some time how Kubernetes is more of a platform to run platforms. We’ve also seen trends in the industry and shifts in what it means to be DevOps or Platform Engineering in recent years. You've positioned Rafay as a Kubernetes Operations Platform that's now evolved into a Cloud Automation Platform. How do you define the difference between Kubernetes management and true platform engineering?

Topic 3 - What does “Enterprise Ready” Kubernetes look like in 2025?

Topic 4 - Let’s flip over to AI/ML and GPUs with Kubernetes for a bit. Many developers and data scientists aren’t aware of the underlying platform they run on. I saw a stat recently that about 95% of AI runs on Kubernetes, either on-prem or in the cloud. Despite this, Platform teams are often stuck doing manual GPU provisioning, which doesn't scale with AI adoption. How do modern GPU orchestration solutions change the platform team's role?

Topic 5 - With GPU workloads often handling sensitive data and AI models, security becomes even more critical. How should organizations approach security and compliance in their GPU-enabled Kubernetes operations?

Topic 6 - "Most developers don't want to write YAML or manage clusters — they just want to ship software." How do you balance giving developers the self-service capabilities they want while maintaining the control and governance that platform teams need?


FEEDBACK?

Up next
Today
The 5-10-85 Reality of Enterprise AI
Three years since the launch of ChatGPT, what does the landscape of Enterprise AI look like today? What’s working, what’s struggling and what’s still unknown?SHOW: 966SHOW TRANSCRIPT: The Cloudcast #966 TranscriptSHOW VIDEO: https://youtube.com/@TheCloudcastNET CLOUD NEWS OF THE ... Show More
27m 49s
Oct 8
Using AI Reasoning to Prevent AI Scams
Alan Lefort (CEO, @StrongestLayer) discusses how LLM-powered reasoning is transforming phishing security from reactive pattern-matching to predictive threat detection, and why traditional rule-based systems can no longer defend against sophisticated AI-generated phishing attacks. ... Show More
34 m
Oct 5
Will Cloud Providers start acquiring SaaS?
As cloud matures, could the hyperscale cloud providers start looking to acquire SaaS providers to build out a bundled application portfolio? Or are the demands of AI investment too much to pursue that strategy? SHOW: 964SHOW TRANSCRIPT: The Cloudcast #964 TranscriptSHOW VIDEO: ht ... Show More
28m 16s
Recommended Episodes
Apr 2025
Simplifying Data Pipelines with Durable Execution
Summary In this episode of the Data Engineering Podcast Jeremy Edberg, CEO of DBOS, about durable execution and its impact on designing and implementing business logic for data systems. Jeremy explains how DBOS's serverless platform and orchestrator provide local resilience and r ... Show More
39m 49s
Jul 2022
Writing, Learning and Tech, with Ian Miell
Ian Miell is a partner at consultancy Container Solutions, and an author of books on Bash, Git, Terraform and Docker. He explains to Craig how writing - whether runbooks, blog posts, training courses, or “real” books, can help you learn and make your team more effective. Do you h ... Show More
45m 38s
Sep 18
From RAG to Relational: How Agentic Patterns Are Reshaping Data Architecture
SummaryIn this episode of the AI Engineering Podcast Mark Brooker, VP and Distinguished Engineer at AWS, talks about how agentic workflows are transforming database usage and infrastructure design. He discusses the evolving role of data in AI systems, from traditional models to m ... Show More
52m 58s
Nov 2022
Kubernetes on Vessels, with Louis Bailleul
Louis Bailleul is a Chief Enterprise Architect at PGS. After years of running highly-ranked super computers to process PGS’ seismic data, Louis’s team at PGS has lead a transition to Google Cloud. Listen in to learn about HPC in Google Cloud with GKE, and to explore using Kuberne ... Show More
42m 56s
Feb 2025
Troubleshooting Microservices with Julia Blase
A distributed system is a network of independent services that work together to achieve a common goal. Unlike a monolithic system, a distributed system has no central point of control, meaning it must handle challenges like data consistency, network latency, and system failures. ... Show More
43 m
Jun 2022
Configuration as Data, with Justin Santa Barbara
What is configuration as data, how is different from infrastructure as code, and why can’t anything just be itself anymore? We posed these questions and more to long-time Kubernetes contributor Justin Santa Barbara at KubeCon EU, and this episode is the result. Justin created the ... Show More
50m 49s
Aug 4
#732: How to gain Multi-Cluster Visibility across Kubernetes Clusters with the EKS Dashboard
In this episode, we'll explore how the new Amazon EKS Dashboard solves key challenges in managing Kubernetes at scale across multiple AWS accounts and regions. We'll discuss how it provides centralized visibility into cluster health, versions, and costs - enabling teams to improv ... Show More
24m 53s
Aug 12
Podman with Brent Baude
Podman is an open-source container management tool that allows developers to build, run, and manage containers. Unlike Docker, it supports rootless containers for improved security and is fully compatible with standards from the Open Container Initiative, or OCI. Brent Baude is a ... Show More
43m 24s
Dec 2022
Kubernetes v1.26 Electrifying, with Leonard Pahlke
Leonard Pahlke is not only the Release Lead for Kubernetes v1.26, he's also a co-chair of the CNCF TAG for Environmental Sustainability and a student working toward a Master's Degree in Computer Science at the Hamburg University of Applied Sciences. In this episode, Leonard talks ... Show More
31m 42s
Jun 2025
Vibe Coding vs Low-Code/No-Code: Security Risks and CI/CD Pipeline Impacts for Citizen Developers
Explore the evolution from traditional coding to vibe coding and its relationship with low-code/no-code (LCNC) platforms. This comprehensive analysis examines how AI-assisted development and visual programming tools are creating a new generation of citizen developers, transformin ... Show More
9m 42s