32 K Token Context
Voxtral keeps entire meetings or podcasts in memory, producing coherent transcripts and accurate summaries without window‑sliding hacks.
Voxtral is an open‑source speech AI that delivers real‑time transcription, 32 K‑token context and voice‑driven function calling — all in one API.
Voxtral keeps entire meetings or podcasts in memory, producing coherent transcripts and accurate summaries without window‑sliding hacks.
Stream live audio to Voxtral and receive partial transcripts in < 300 ms — perfect for captions, live events and voice bots.
Let users say "Create a task for tomorrow" and watch Voxtral emit a JSON payload your backend can execute. From voice to action in a single step.
Voxtral auto‑detects language, speaker and sentiment, then delivers bilingual SRT/VTT files for global distribution.
Run Voxtral on‑prem for compliance or use our managed cloud — you own the data either way.
Test every Voxtral feature with no credit card. Scale to millions of minutes using transparent, pay‑as‑you‑go pricing.
From transcription to automation, solve all your voice processing needs in one place
Voxtral auto‑detects 100+ languages and exports perfectly synced SRT/VTT subtitles.
Use Voxtral's function calling to convert speech into JSON actions for ordering, search or document creation.
Process 30‑minute podcasts in batch or stream live audio via WebSocket to get partial transcripts in real time.
Record or upload audio and watch Voxtral generate accurate transcripts, summaries and action triggers in seconds.
Upload your audio file and let our AI provide transcription, analysis, and insights
Click to upload audio file
Supported: MP3, WAV, M4A, FLAC, OGG (Max 50MB)
See how Voxtral powers voice needs across different industries
Turn call recordings into searchable Voxtral transcripts and auto‑tag emotion, saving agents 40 % of lookup time.
Voxtral detects speakers & languages, then creates bilingual captions and key‑point summaries for global audiences.
Everything you need to know about Voxtral
Voxtral is an open‑source speech AI engine offering real‑time transcription, summaries and voice‑triggered automation.