Roll tape and prompt! In this episode of Two Voice Devs, Allen and Mark explore how Google’s new advanced prompting guidelines turn developers into voice directors for Gemini Text-to-Speech. Instead of coding rigid SSML tags, you can now establish a scene, write stage directions, and give "director's notes" to shape a base voice's gender, accent, style, and pacing.

Allen showcases a web app where he directs a single base voice—to play two entirely different characters: a rough Brooklyn cab driver and a classic Southern belle. The hosts discuss using natural language audio tags as cues for laughter, sighs, gasps, and more, and how these theatrical controls are coming alive in real-time with Gemini Live and Gemini 3.1 Flash TTS.

Learn more:

* https://ai.google.dev/gemini-api/docs/speech-generation

[00:00:05] Welcome to Two Voice Devs

[00:00:27] Intro to Gemini Text-to-Speech and Advanced Prompting

[00:01:57] Moving Beyond SSML to Flexible Base Voices

[00:03:07] Prompting Genders and Accents (The Storytelling Analogy)

[00:04:40] Web App Demo: Zephyr as a Brooklyn Cab Driver vs. Southern Belle

[00:06:50] Building Multi-Voice Conversations with Stage Directions

[00:08:41] Using Natural Language Audio Tags for Expressive Cues

[00:11:02] Gemini Live Integration and Dynamic Tone Selection

[00:12:27] Model Details: Gemini 3.1 Flash TTS Preview and Release Info

[00:13:53] Wrap-up and Call for Feedback

Hashtags:

#GeminiTTS #TextToSpeech #GenerativeAI #GoogleDeepMind #GeminiLive #GeminiFlash #AIStudio #DeveloperTools #SpeechSynthesis #VoiceFirst #AdvancedPrompting

Episode 275