Roll tape and prompt! In this episode of Two Voice Devs, Allen and Mark explore how Google’s new advanced prompting guidelines turn developers into voice directors for Gemini Text-to-Speech. Instead of coding rigid SSML tags, you can now establish a scene, write stage directions, and give "director's notes" to shape a base voice's gender, accent, style, and pacing.
Allen showcases a web app where he directs a single base voice—to play two entirely different characters: a rough Brooklyn cab driver and a classic Southern belle. The hosts discuss using natural language audio tags as cues for laughter, sighs, gasps, and more, and how these theatrical controls are coming alive in real-time with Gemini Live and Gemini 3.1 Flash TTS.
Learn more:
* https://ai.google.dev/gemini-api/docs/speech-generation
[00:00:05] Welcome to Two Voice Devs
[00:00:27] Intro to Gemini Text-to-Speech and Advanced Prompting
[00:01:57] Moving Beyond SSML to Flexible Base Voices
[00:03:07] Prompting Genders and Accents (The Storytelling Analogy)
[00:04:40] Web App Demo: Zephyr as a Brooklyn Cab Driver vs. Southern Belle
[00:06:50] Building Multi-Voice Conversations with Stage Directions
[00:08:41] Using Natural Language Audio Tags for Expressive Cues
[00:11:02] Gemini Live Integration and Dynamic Tone Selection
[00:12:27] Model Details: Gemini 3.1 Flash TTS Preview and Release Info
[00:13:53] Wrap-up and Call for Feedback
Hashtags:
#GeminiTTS #TextToSpeech #GenerativeAI #GoogleDeepMind #GeminiLive #GeminiFlash #AIStudio #DeveloperTools #SpeechSynthesis #VoiceFirst #AdvancedPrompting
Episode 275