logo
episode-header-image
Aug 29
25m 40s

Episode 253 - The Future of Voice? Explo...

Mark and Allen
About this episode

In this episode of Two Voice Devs, Mark and Allen dive into the new experimental Text-to-Speech (TTS) model in Google's Gemini 2.5. They explore its capabilities, from single-speaker to multi-speaker audio generation, and discuss how it's a significant leap from the old days of SSML. They also touch on how this new technology can be integrated with LangChainJS to create more dynamic and natural-sounding voice applications. Is this the return of voice as the primary interface for AI?


[00:00:00] Introduction

[00:00:45] Google's new experimental TTS model for Gemini

[00:01:55] Demo of single-speaker TTS in Google's AI Studio

[00:03:05] Code walkthrough for single-speaker TTS

[00:04:30] Lack of fine-grained control compared to SSML

[00:05:15] Using text cues to shape the TTS output

[00:06:20] Demo of multi-speaker TTS with a script

[00:09:50] Code walkthrough for multi-speaker TTS

[00:11:30] The model is tuned for TTS, not general conversation

[00:12:10] Using a separate LLM to generate a script for the TTS model

[00:13:30] Code walkthrough of the two-function approach with LangChainJS

[00:16:15] LangChainJS integration details

[00:19:00] Is Speech Markdown still relevant?

[00:21:20] Latency issues with the current TTS model

[00:22:00] Caching strategies for TTS

[00:23:30] Voice as the natural UI for AI

[00:25:30] Outro


#Gemini #TTS #VoiceAI #VoiceFirst #AI #Google #LangChainJS #LLM #Developer #Podcast

Up next
Sep 25
Episode 255 - Agonizing About Agent-to-Agent
Join Allen Firstenberg and Noble Ackerson in a deep dive into the evolving world of AI agent protocols. In this episode of Two Voice Devs, they unpack the Agent-to-Agent (A2A) protocol, comparing it with the Model Context Protocol (MCP). They explore the fundamental differences, ... Show More
49m 6s
Sep 18
Episode 254 - Agent Frameworks Compared: Google's ADK vs LangChainJS
Allen and Mark are back to discuss AI agent frameworks again. This time, Allen compares Google's Agent Development Kit (ADK) with LangChainJS and LangGraphJS. He walks through building a simple agent in both frameworks, highlighting the differences in their approaches, from confi ... Show More
33m 21s
Aug 15
Episode 252 - GPT-5 First Look: Evolution, Not Revolution
Join Allen and Mark as they take a first look at the newly released GPT-5 from OpenAI. They dive into the details of what's new, what's changed, and what's missing, frequently comparing it to other models like Google's Gemini. From the new mini and nano models to the pricing wars ... Show More
27m 35s
Recommended Episodes
Nov 2024
Making Sense of Agentic AI | ThoughtWorks Birgitta Boeckeler
There’s AI agents. There’s AI tooling. Do either drive business impact or are they just more things your dev team is supposed to stay on top of? Birgitta Boeckeler, Global Lead for AI Assisted Software Delivery at ThoughtWorks, joins the show to discuss the practical applications ... Show More
47m 40s
Sep 2023
Meta’s Quest 3, AI chatbots and Ray-Ban smart glasses
This week, it’s Meta’s turn to highlight AI during its device event. In this episode, Devindra and Cherlynn dive into all of the news from Meta’s Connect 2023 event, where it unveiled Meta AI and accompanying celebrity-powered chatbots. Oh yah, and it introduced the Meta Quest 3 ... Show More
1h 6m
Sep 2024
Study Reveals Vulnerabilities in Alexa, Siri, and Google Assistant to Malicious Commands
In this episode, we explore a recent study that uncovers how popular voice assistants like Alexa, Siri, and Google Assistant are susceptible to malicious commands. We discuss the potential risks and what users can do to protect their devices. Get on the AI Box Waitlist: ⁠⁠⁠https: ... Show More
6m 17s
Mar 2023
Do you have curtains on your house?
On this episode, the CyberWire's UK Correspondent Carole Theriault talks with Iain Thomson from the Register about why he has no IoT in his house and what advice he offers for those who do. Joe's story features ten social engineering techniques. Dave has a story starts with an or ... Show More
49m 40s
Nov 2024
SN 1001: Artificial General Intelligence (AGI) - Gmail Temp Addresses, Russia's Internet Off Switch
How Microsoft lured the US Government into a far deeper and expensive dependency upon its cybersecurity solutions. Gmail to offer native throwaway email aliases like Apple and Mozilla. Russia to ban several additional hosting companies and give its big Internet disconnect switch ... Show More
2h 26m
Sep 2024
AI is more than GenAI
GenAI is often what people think of when someone mentions AI. However, AI is much more. In this episode, Daniel breaks down a history of developments in data science, machine learning, AI, and GenAI in this episode to give listeners a better mental model. Don’t miss this one if y ... Show More
40m 3s
Jul 2019
AWS’ new text-to-speech engine sounds like a newscaster
Thanks to modern machine learning techniques, text-to-speech engines have made massive strides over the last few years. It used to be incredibly easy to know that it was a computer that was reading a text and not a human being. But that’s changing quickly. Amazon’s AWS cloud comp ... Show More
2m 48s
Sep 17
AI: Copilot or Job Killer? - An Interview With Eliman Dambell
Some CEOs brag about using AI to cut jobs. But there’s another way to see it.In this episode, I sit down with Eliman Dambell, co-founder of Savvio.ai and former London finance director turned crypto analyst. He brings a unique perspective on why “AI should be a copilot, not a rep ... Show More
31m 53s