logo
episode-header-image
Feb 2025
1h 7m

Accelerating AI Training and Inference w...

Sam Charrington
About this episode

Today, we're joined by Ron Diamant, chief architect for Trainium at Amazon Web Services, to discuss hardware acceleration for generative AI and the design and role of the recently released Trainium2 chip. We explore the architectural differences between Trainium and GPUs, highlighting its systolic array-based compute design, and how it balances performance across key dimensions like compute, memory bandwidth, memory capacity, and network bandwidth. We also discuss the Trainium tooling ecosystem including the Neuron SDK, Neuron Compiler, and Neuron Kernel Interface (NKI). We also dig into the various ways Trainum2 is offered, including Trn2 instances, UltraServers, and UltraClusters, and access through managed services like AWS Bedrock. Finally, we cover sparsity optimizations, customer adoption, performance benchmarks, support for Mixture of Experts (MoE) models, and what’s next for Trainium.


The complete show notes for this episode can be found at https://twimlai.com/go/720.

Up next
Yesterday
Distilling Transformers and Diffusion Models for Robust Edge Use Cases with Fatih Porikli - #738
Today, we're joined by Fatih Porikli, senior director of technology at Qualcomm AI Research for an in-depth look at several of Qualcomm's accepted papers and demos featured at this year’s CVPR conference. We start with “DiMA: Distilling Multi-modal Large Language Models for Auton ... Show More
1 h
Jun 24
Building the Internet of Agents with Vijoy Pandey - #737
Today, we're joined by Vijoy Pandey, SVP and general manager at Outshift by Cisco to discuss a foundational challenge for the enterprise: how do we make specialized agents from different vendors collaborate effectively? As companies like Salesforce, Workday, and Microsoft all dev ... Show More
56m 13s
Jun 17
LLMs for Equities Feature Forecasting at Two Sigma with Ben Wellington - #736
Today, we're joined by Ben Wellington, deputy head of feature forecasting at Two Sigma. We dig into the team’s end-to-end approach to leveraging AI in equities feature forecasting, covering how they identify and create features, collect and quantify historical data, and build pre ... Show More
59m 31s
Recommended Episodes
Oct 2017
Data science tools and other announcements from Ignite
In this episode, Microsoft's Corporate Vice President for Cloud Artificial Intelligence, Joseph Sirosh, joins host Kyle Polich to share some of the Microsoft's latest and most exciting innovations in AI development platforms. Last month, Microsoft launched a set of three powerful ... Show More
31m 40s
Jan 2025
#229 Mitesh Agrawal: Why Lambda Labs’ AI Cloud Is a Game-Changer for Developers
This episode is sponsored by Netsuite by Oracle, the number one cloud financial system, streamlining accounting, financial management, inventory, HR, and more.   NetSuite is offering a one-of-a-kind flexible financing program. Head to  https://netsuite.com/EYEONAI to know more.  ... Show More
56m 7s
Oct 2024
#692: A Discussion About Serverless and How to Make the Most of It
Simon is joined by Stephen Liedig to discuss the evolution of serverless technology and its impact on application development, exploring benefits like scalability, cost optimization, and faster dev cycles. They delve into key services and concepts in serverless design, including ... Show More
35m 28s
Nov 2022
Kubernetes on Vessels, with Louis Bailleul
Louis Bailleul is a Chief Enterprise Architect at PGS. After years of running highly-ranked super computers to process PGS’ seismic data, Louis’s team at PGS has lead a transition to Google Cloud. Listen in to learn about HPC in Google Cloud with GKE, and to explore using Kuberne ... Show More
42m 56s
Dec 2024
AI Semiconductor Landscape feat. Dylan Patel | BG2 w/ Bill Gurley & Brad Gerstner
Open Source bi-weekly convo w/ Bill Gurley and Brad Gerstner on all things tech, markets, investing & capitalism. This week they are joined by Dylan Patel, Founder & Chief Analyst at SemiAnalysis, to discuss origins of SemiAnalysis, Google's AI workload, NVIDIA's competitive edge ... Show More
1h 29m
Apr 2024
Episode 192 - Google Cloud Next 2024 Recap
Join Allen Firstenberg and guest host Stefania Pecore on Two Voice Devs as they delve into the exciting announcements and highlights from Google Cloud Next 2024! This episode focuses on the latest advancements in AI and their impact on the healthcare industry, providing valuable ... Show More
40m 35s
Mar 2025
NVIDIA RAPIDS and Open Source ML Acceleration with Chris Deotte and Jean-Francois Puget
NVIDIA RAPIDS is an open-source suite of GPU-accelerated data science and AI libraries. It leverages CUDA and significantly enhances the performance of core Python frameworks including Polars, pandas, scikit-learn and NetworkX. Chris Deotte is a Senior Data Scientist at NVIDIA an ... Show More
42m 6s
Oct 2024
Dylan Patel & Jon (Asianometry) – How the Semiconductor Industry Actually Works
A bonanza on the semiconductor industry and hardware scaling to AGI by the end of the decade.Dylan Patel runs Semianalysis, the leading publication and research firm on AI hardware. Jon Y runs Asianometry, the world’s best YouTube channel on semiconductors and business history.* ... Show More
2h 9m
Nov 2024
NVIDIA's Jensen Huang on AI Chip Design, Scaling Data Centers, and his 10-Year Bets
In this week’s episode of No Priors, Sarah and Elad sit down with Jensen Huang, CEO of NVIDIA, for the second time to reflect on the company’s extraordinary growth over the past year. Jensen discusses AI’s takeover of datacenters and NVIDIA’s rapid development of x.AI’s superclus ... Show More
36m 53s