TTS MCP

Local text-to-speech for AI agents, powered by Mistral's Voxtral on Apple Silicon

TTS MCP title graphic

TTS MCP runs Mistral’s Voxtral text-to-speech model locally, with real-time streaming playback, an interactive CLI, a FastAPI server, and an MCP server that lets Claude Code and Claude Desktop speak. It ships 20 voices across 9 languages and several quantization options, with no cloud dependency.

Sample output: German, male voice

Quick Start

Requires an Apple Silicon (but is easily adaptable to Linux), Python 3.12+, uv, and just. Clone the GitHub repository, install the dependencies and download a tts model as follows:

git clone https://github.com/florianbuetow/tts-mcp.git
cd tts-mcp
just init        # install dependencies
just download    # choose and download a Voxtral model

The just download step lets you pick from three quantization variants of Voxtral 4B, trading size against speed and quality:

ModelSizeSpeed
Voxtral-4B-TTS-2603-mlx-4bit~2.5 GBFastest (RTF below 1.0x)
Voxtral-4B-TTS-2603-mlx-6bit~3.5 GBBalanced (RTF ~1.1x)
Voxtral-4B-TTS-2603-mlx-bf16~8.0 GBHighest quality (RTF ~6.3x)

Next, create/edit the config.yaml the defaults in the project root. The parameters default_voice and save_wav control the voice and whether or not anything sent through the API gets saved to data/output/ as a timestamped WAV file.

model: ./data/models/Voxtral-4B-TTS-2603-mlx-6bit
models_dir: ./data/models
sample_rate: 24000
default_voice: de_male
save_wav: true
simplify_punctuation: false
normalize_audio: true
target_lufs: -20.0
true_peak_ceiling_db: -1.0
min_duration_seconds: 0.5
host: 0.0.0.0
port: 12000

With the model and config in place, you can start the REST API server or launch the interactive chat:

just serve    # start the FastAPI server: POST /say, GET /voices, GET /status/{id}
just chat     # start the interactive CLI chat

The just chat command loads the model in-process, lets you pick a voice, then synthesizes whatever you type (Enter twice submits, ESC twice quits the REPL):

=== Running Interactive Chat ===
Using model: Voxtral-4B-TTS-2603-mlx-6bit

Available voices:
  1. ar_male
  2. casual_female
  3. casual_male
  ...
  20. pt_male

Select voice [1-20]: 6
Loading model: data/models/Voxtral-4B-TTS-2603-mlx-6bit
Voice: de_male
Type text. Enter twice submits (single Enter = newline). Enter twice on empty input or ESC twice quits.
Text: Hello, it is nice to see you today!

To use TTS from your AI assistant, you need to install the MCP bridge with cd mcp && npm install inside the cloned project and then register the MCP server in the configuration of your AI of choice. The easiest way to do that is to ask your AI to do that for you:

Prompt your AI

“I need your help to configure the tts-mcp server for codex app, claude desktop and claude cli. There is an example on how to configure it in README.md. Please use it.”

Supported Voices

Pick from 20 voices across 9 languages. Pass any as the voice parameter, or choose one interactively in the CLI:

LanguageVoices
Englishcasual_female, casual_male, cheerful_female, neutral_female, neutral_male
Germande_female, de_male
Frenchfr_female, fr_male
Spanishes_female, es_male
Italianit_female, it_male
Dutchnl_female, nl_male
Portuguesept_female, pt_male
Hindihi_female, hi_male
Arabicar_male

Learn More

If you want to learn more about Mistral’s Voxtral text-to-speech models, click here. If you want to experiment with the models, here is a good place to start.