API Docs

Studio plans include API access. Use an API key created in your account settings.

Authentication

Include your API key in the X-API-Key header.

X-API-Key: <your_api_key>

POST /api/tts

Generate speech from text. Returns raw audio bytes with the appropriate Content-Type header.

{
  "text": "Hello world",
  "voice": "Lauren",
  "format": "mp3",
  "speed": 1.0,
  "pitch": 1.0,
  "sampleRate": 24000
}

Parameters

Field	Type	Required	Description
`text`	string	Yes	The text to convert to speech (max 10,000 chars on Studio).
`voice`	string	No	Voice ID. Defaults to "Lauren". See voice list below.
`format`	string	No	"mp3" (default) or "wav".
`speed`	number	No	Speech speed, 0.5 to 2.0. Default 1.0.
`pitch`	number	No	Pitch adjustment, 0.5 to 2.0. Default 1.0.
`sampleRate`	number	No	Audio sample rate, 16000 to 48000. Default 24000.

Response

On success, returns 200 with raw audio bytes. The Content-Type header is audio/mpeg for MP3 or audio/wav for WAV. The X-Chars-Used header shows how many characters were consumed.

Using Cloned Voices

Studio plan users with cloned voices can use them via the API. Use the clone-{name} format, where {name} is the name you gave your clone (lowercase, hyphens for spaces).

// If your clone is named "Joshua":
{
  "text": "Hello from my cloned voice",
  "voice": "clone-joshua"
}

// If your clone is named "My Narrator":
{
  "text": "Welcome to the show",
  "voice": "clone-my-narrator"
}

You can also use the clone UUID returned when the voice was created: clone-{uuid}. Find your clone voice IDs in your account dashboard under Voice Cloning.

Audio Markup (Delivery Style)

Pro and Studio plans can use audio markup tags to control delivery style. Place one tag at the start of your text for best results. Tags are not counted toward your character limit. English only. This is an experimental feature.

// Emotion / delivery tags (place at start of text):
[happy]  [sad]  [angry]  [excited]  [whisper]
[calm]  [scared]  [friendly]  [professional]  [sarcastic]

// Example request:
{
  "text": "[excited] Check out this amazing new feature!",
  "voice": "Sarah"
}

Tip: For best results, use one tag per request. If you need different emotions for different parts, split them into separate API calls.

Sound Effects

Insert non-verbal sounds inline in your text. Sound effects are available on all plans and produce actual audio in the output.

// Available sound effect tags:
[laughing]  [cough]  [sigh]  [gasp]  [groan]
[yawn]  [sniff]  [clap]  [shush]  [hmm]
[ugh]  [cheer]

// Example:
{
  "text": "[laughing] That was hilarious! [sigh] Okay, back to work.",
  "voice": "Alex"
}

Word Timestamps / SRT Subtitles

Pro and Studio plans can request word-level timestamps. The response includes an X-Timestamps header with JSON-encoded timing data, and an SRT subtitle file can be downloaded alongside the audio from the web interface.

// To request timestamps, add the header:
X-Include-Timestamps: true

// The response will include:
// - Audio bytes in the body
// - X-Timestamps header with JSON array of word timings
// - X-Chars-Used header with character count

// Timestamp format:
[
  { "word": "Hello", "start": 0.0, "end": 0.42 },
  { "word": "world", "start": 0.45, "end": 0.87 }
]

Available Voices

Voices are organized by tier. Your plan determines which voices you can access via the API. Use the voice name as the voice parameter.

Tier	Plan Required	Example Voices
Free	All plans	Lauren, Elliot, Blake, Luna, Mia, Oliver, Ethan, Claire, Dennis, Sophie
Starter	Starter+	Alex, Timothy, Ashley, Liam, Chloe, Carter, Jessica, Julia, Priya, Hana
Pro	Pro+	Theodore, Shaun, Edward, Deborah, Wendy, Craig, Elizabeth, James, Felix, Serena
Studio	Studio only	Sarah, Hunter (Mark), Benedict (Clive), Dominus, Hades, Pixie, + all non-English voices

For the full voice list with language, gender, and accent metadata, call GET /api/voice-metadata (public, no auth required).

Rate Limits

Limit	Value
Requests per minute	30
Max characters per request	20,000 (Studio)
Monthly character limit	Per plan (750K for Studio). Top-ups available.
Concurrent requests	5

If you hit a rate limit, the API returns 429 with a Retry-After header.

Voice Design via API

Voice Design (creating a custom voice from a text description) is currently available through the web interface only. Designed voices are automatically saved to your account and can then be used via the API using the clone-{name} voice ID format, just like cloned voices. Voice design shares the same 3-slot limit as voice cloning.

Error Codes

Status	Error	Description
400	CloneNotFound	The requested clone voice was not found in your account.
401	Unauthorized	Missing or invalid API key.
402	PaymentRequired	Feature requires a higher plan (e.g., cloning on Studio).
403	Forbidden	Access denied (banned, wrong owner, etc.).
429	LimitExceeded	Monthly character limit or rate limit exceeded.
500	TTSProviderError	TTS service error. Try again or use a different voice.