🐸 Coqui TTS with VITS Speed Control

Neural text-to-speech with pitch-preserving speed adjustment

🚀 Setup Instructions

1. Install Dependencies:

pip install TTS flask flask-cors torch soundfile numpy pydub librosa

2. Run the Enhanced Server:

python coqui_tts_server.py

3. Server will start at: https://tts-gcp.arthur.digital/

Connecting to Coqui TTS server...

📦 Available Models

Select a model to load (VITS models support speed control):

🎭 Voice Selection

Choose a voice/speaker for the selected model:

🎛️ VITS Speed Control

Adjust speech speed while preserving natural pitch. VITS models use neural length_scale parameter for high-quality speed adjustment.

🚀 Speech Speed 0.90x

Fine-Tuned Speed Range: 0.5x (very slow) → 1.0x (normal) → 1.5x (fast)
Sweet Spots: 0.85x-0.95x for slightly slower, 0.75x-0.85x for clearer speech
Recommended: Start with 0.90x for subtle slowing

Text to Synthesize

Language

Custom Speaker ID Override voice selection with custom speaker ID

Output Format

Voice Cloning (Optional)

📎 Upload reference audio for voice cloning (.wav, .mp3)

Upload 3-10 seconds of clear speech for voice cloning (works with XTTS v2 model)

🐸 Coqui TTS with VITS Speed Control

🚀 Setup Instructions

📦 Available Models

🎭 Voice Selection

🎛️ VITS Speed Control

🎵 Generated Audio