Voxtral — Make Voice Instantly Useful

Voxtral is an open‑source speech AI that delivers real‑time transcription, 32 K‑token context and voice‑driven function calling — all in one API.

Try Voxtral Now Watch Demo

Voxtral Voice AI - Audio Waveform Visualization

Why Developers Choose Voxtral

32 K Token Context

Voxtral keeps entire meetings or podcasts in memory, producing coherent transcripts and accurate summaries without window‑sliding hacks.

Real‑Time WebSocket API

Stream live audio to Voxtral and receive partial transcripts in < 300 ms — perfect for captions, live events and voice bots.

Function Calling from Speech

Let users say "Create a task for tomorrow" and watch Voxtral emit a JSON payload your backend can execute. From voice to action in a single step.

100 + Languages & Translation

Voxtral auto‑detects language, speaker and sentiment, then delivers bilingual SRT/VTT files for global distribution.

Open‑Source under Apache 2.0

Run Voxtral on‑prem for compliance or use our managed cloud — you own the data either way.

50 Free Minutes Monthly

Test every Voxtral feature with no credit card. Scale to millions of minutes using transparent, pay‑as‑you‑go pricing.

Key Voxtral Features

From transcription to automation, solve all your voice processing needs in one place

Multilingual Detection & Translation

Voxtral auto‑detects 100+ languages and exports perfectly synced SRT/VTT subtitles.

Voice‑Triggered Automation

Use Voxtral's function calling to convert speech into JSON actions for ordering, search or document creation.

Batch & Streaming Modes

Process 30‑minute podcasts in batch or stream live audio via WebSocket to get partial transcripts in real time.

See Voxtral in Action

Record or upload audio and watch Voxtral generate accurate transcripts, summaries and action triggers in seconds.

Audio Processor

Upload your audio file and let our AI provide transcription, analysis, and insights

Audio File

Click to upload audio file

Supported: MP3, WAV, M4A, FLAC, OGG (Max 50MB)

Streaming Mode

Processing Model

Additional Context (Optional)0/500

Industry Use Cases Powered by Voxtral

See how Voxtral powers voice needs across different industries

Customer‑Service Knowledge Base

Turn call recordings into searchable Voxtral transcripts and auto‑tag emotion, saving agents 40 % of lookup time.

Real-time call transcription

Auto key point extraction

Emotion analysis tags

Multi-dimensional search

Podcast Bilingual Subtitles

Voxtral detects speakers & languages, then creates bilingual captions and key‑point summaries for global audiences.

Multi-language detection

Timeline alignment

Auto chapter division

Keyword extraction

Voxtral FAQ — Everything You Need to Know

Everything you need to know about Voxtral

Voxtral is an open‑source speech AI engine offering real‑time transcription, summaries and voice‑triggered automation.