Captions
98.5% accurate. Under 200ms. 50+ languages.
AI-powered captioning that runs in real-time. Deepgram nova-3 for live streams, Cohere Transcribe for VOD. Speaker identification, translation, and FCC-compliant export — all automatic.
Captioning that keeps up with your speakers
From raw audio to broadcast-compliant captions. Live and VOD, in any language, with speaker identification.
Sub-200ms live captions
Deepgram nova-3 processes speech in under 200ms. Viewers see captions while the speaker is still talking.
50+ languages
Auto-detect source language. Translate to 50+ target languages simultaneously. Viewers choose their preferred language.
Speaker identification
Automatic diarization labels who is speaking. Custom names mapped to voice profiles. Color-coded per speaker.
AI post-processing
Auto-punctuation, profanity filtering, grammar correction, and terminology enforcement. Domain-specific vocabularies for medical, legal, and technical content.
5 export formats
WebVTT (web), SRT (general), SCC (broadcast), EBU-STL (European broadcast), and burned-in video. Batch export entire libraries.
FCC and ADA compliant
Meets FCC closed captioning requirements and ADA accessibility standards. Automatic compliance reports for audit trails.
Two engines. One API.
The system picks the right engine for each job. You get speed for live and precision for VOD — without configuring anything.
Live engine
Deepgram nova-3
- Sub-200ms end-to-end latency
- 98.5% accuracy on first pass
- Streaming word-by-word display
- Real-time speaker diarization
- Live correction editor
VOD engine
Cohere Transcribe
- 99.2% accuracy with post-processing
- Zero API cost (Cohere free tier)
- Batch processing for entire libraries
- Paragraph segmentation
- Review queue before publish
Optional: Rev.com human captioning — 99.9% accuracy, 24-hour turnaround for archival and legal content
Three steps to accessible content
Connect your stream or upload
Point any live stream at WAVE — RTMP, SRT, WebRTC, or NDI. For VOD, upload files or connect your storage bucket. Audio extracted automatically.
Configure language and output
Set source language (or auto-detect). Choose target translation languages. Pick export format: WebVTT for web, SCC for broadcast, SRT for general. One stream, multiple caption tracks.
Captions appear automatically
Live captions render in the player within 200ms. VOD captions generate and enter the review queue. Export, embed, or burn in — your choice.
Built for these workflows
One API for all caption workflows
Start a caption session, get results via webhook or polling. Works with any stream or file.
// Start live captioning
const session = await wave.captions.start({
streamId: 'str_abc123',
languages: ['en', 'es', 'fr'],
speakerDiarization: true,
format: 'webvtt',
});
// Get captions via webhook
// POST /webhooks/captions
// { "type": "caption.segment", "text": "...", "speaker": "Speaker 1" }Technical specifications
Frequently asked questions
What caption accuracy can I expect?
98.5% for live streams (Deepgram nova-3) and 99.2% for VOD (Cohere Transcribe with post-processing). Speaker identification included at no extra cost. Accuracy improves with custom vocabulary lists.
Can I translate captions in real-time?
Yes. Source language detected automatically. Translate to 50+ target languages simultaneously. Each viewer selects their preferred language from the player controls. Translation adds under 100ms to total latency.
Does it meet broadcast caption requirements?
Yes. FCC closed captioning compliant for US broadcast. Exports in SCC (US broadcast), EBU-STL (European broadcast), WebVTT (web), and SRT (general). Automatic compliance reporting for audit trails.
Can I edit captions before publishing?
Yes. Real-time correction editor for live streams. VOD captions go through a review queue with inline editing, bulk find-and-replace, and approval workflow before publishing.
How does the dual-engine system work?
Live streams use Deepgram nova-3 for speed (sub-200ms). VOD content uses Cohere Transcribe for maximum accuracy (99.2%). The system routes automatically based on content type — no configuration needed.
What about Rev.com human captioning?
Available as an optional add-on for content requiring human-level accuracy. 99.9% accuracy with 24-hour turnaround. Ideal for legal depositions, medical transcriptions, and archival content.
Accessible content in 200 milliseconds
Start captioning with WAVE. 60 free minutes every month.