Captions

98.5% accurate. Under 200ms. 50+ languages.

AI-powered captioning that runs in real-time. Deepgram nova-3 for live streams, Cohere Transcribe for VOD. Speaker identification, translation, and FCC-compliant export — all automatic.

Start Streaming Free Learn More

98.5%

Live accuracy

<200ms

Latency

50+

Languages

99.2%

VOD accuracy

Captioning that keeps up with your speakers

From raw audio to broadcast-compliant captions. Live and VOD, in any language, with speaker identification.

Sub-200ms live captions

Deepgram nova-3 processes speech in under 200ms. Viewers see captions while the speaker is still talking.

50+ languages

Auto-detect source language. Translate to 50+ target languages simultaneously. Viewers choose their preferred language.

Speaker identification

Automatic diarization labels who is speaking. Custom names mapped to voice profiles. Color-coded per speaker.

AI post-processing

Auto-punctuation, profanity filtering, grammar correction, and terminology enforcement. Domain-specific vocabularies for medical, legal, and technical content.

5 export formats

WebVTT (web), SRT (general), SCC (broadcast), EBU-STL (European broadcast), and burned-in video. Batch export entire libraries.

FCC and ADA compliant

Meets FCC closed captioning requirements and ADA accessibility standards. Automatic compliance reports for audit trails.

Two engines. One API.

The system picks the right engine for each job. You get speed for live and precision for VOD — without configuring anything.

Live engine

Deepgram nova-3

Sub-200ms end-to-end latency
98.5% accuracy on first pass
Streaming word-by-word display
Real-time speaker diarization
Live correction editor

VOD engine

Cohere Transcribe

99.2% accuracy with post-processing
Zero API cost (Cohere free tier)
Batch processing for entire libraries
Paragraph segmentation
Review queue before publish

Optional: Rev.com human captioning — 99.9% accuracy, 24-hour turnaround for archival and legal content

Three steps to accessible content

Connect your stream or upload

Point any live stream at WAVE — RTMP, SRT, WebRTC, or NDI. For VOD, upload files or connect your storage bucket. Audio extracted automatically.

Configure language and output

Set source language (or auto-detect). Choose target translation languages. Pick export format: WebVTT for web, SCC for broadcast, SRT for general. One stream, multiple caption tracks.

Captions appear automatically

Live captions render in the player within 200ms. VOD captions generate and enter the review queue. Export, embed, or burn in — your choice.

Built for these workflows

Live broadcast TV

Webinars and events

Worship services

Education and e-learning

Social media clips

News and journalism

Corporate town halls

Legal depositions

One API for all caption workflows

Start a caption session, get results via webhook or polling. Works with any stream or file.

// Start live captioning
const session = await wave.captions.start({
  streamId: 'str_abc123',
  languages: ['en', 'es', 'fr'],
  speakerDiarization: true,
  format: 'webvtt',
});

// Get captions via webhook
// POST /webhooks/captions
// { "type": "caption.segment", "text": "...", "speaker": "Speaker 1" }

Technical specifications

98.5%

Live accuracy

99.2%

VOD accuracy

<200ms

Latency

50+

Languages

Automatic

Speaker ID

WebVTT, SRT, SCC, EBU-STL

Formats

Frequently asked questions

What caption accuracy can I expect?

98.5% for live streams (Deepgram nova-3) and 99.2% for VOD (Cohere Transcribe with post-processing). Speaker identification included at no extra cost. Accuracy improves with custom vocabulary lists.

Can I translate captions in real-time?

Yes. Source language detected automatically. Translate to 50+ target languages simultaneously. Each viewer selects their preferred language from the player controls. Translation adds under 100ms to total latency.

Does it meet broadcast caption requirements?

Yes. FCC closed captioning compliant for US broadcast. Exports in SCC (US broadcast), EBU-STL (European broadcast), WebVTT (web), and SRT (general). Automatic compliance reporting for audit trails.

Can I edit captions before publishing?

Yes. Real-time correction editor for live streams. VOD captions go through a review queue with inline editing, bulk find-and-replace, and approval workflow before publishing.

How does the dual-engine system work?

Live streams use Deepgram nova-3 for speed (sub-200ms). VOD content uses Cohere Transcribe for maximum accuracy (99.2%). The system routes automatically based on content type — no configuration needed.

What about Rev.com human captioning?

Available as an optional add-on for content requiring human-level accuracy. 99.9% accuracy with 24-hour turnaround. Ideal for legal depositions, medical transcriptions, and archival content.

Accessible content in 200 milliseconds

Start captioning with WAVE. 60 free minutes every month.

Start Streaming Free Schedule Live Demo

Captions

98.5% accurate. Under 200ms. 50+ languages.

AI-powered captioning that runs in real-time. Deepgram nova-3 for live streams, Cohere Transcribe for VOD. Speaker identification, translation, and FCC-compliant export — all automatic.

98.5%

Live accuracy

<200ms

Latency

50+

Languages

99.2%

VOD accuracy

Captioning that keeps up with your speakers

From raw audio to broadcast-compliant captions. Live and VOD, in any language, with speaker identification.

Sub-200ms live captions

Deepgram nova-3 processes speech in under 200ms. Viewers see captions while the speaker is still talking.

50+ languages

Auto-detect source language. Translate to 50+ target languages simultaneously. Viewers choose their preferred language.

Speaker identification

Automatic diarization labels who is speaking. Custom names mapped to voice profiles. Color-coded per speaker.

AI post-processing

Auto-punctuation, profanity filtering, grammar correction, and terminology enforcement. Domain-specific vocabularies for medical, legal, and technical content.

5 export formats

WebVTT (web), SRT (general), SCC (broadcast), EBU-STL (European broadcast), and burned-in video. Batch export entire libraries.

FCC and ADA compliant

Meets FCC closed captioning requirements and ADA accessibility standards. Automatic compliance reports for audit trails.

Two engines. One API.

The system picks the right engine for each job. You get speed for live and precision for VOD — without configuring anything.

Live engine

Deepgram nova-3

Sub-200ms end-to-end latency
98.5% accuracy on first pass
Streaming word-by-word display
Real-time speaker diarization
Live correction editor

VOD engine

Cohere Transcribe

99.2% accuracy with post-processing
Zero API cost (Cohere free tier)
Batch processing for entire libraries
Paragraph segmentation
Review queue before publish

Optional: Rev.com human captioning — 99.9% accuracy, 24-hour turnaround for archival and legal content

Three steps to accessible content

Connect your stream or upload

Point any live stream at WAVE — RTMP, SRT, WebRTC, or NDI. For VOD, upload files or connect your storage bucket. Audio extracted automatically.

Configure language and output

Set source language (or auto-detect). Choose target translation languages. Pick export format: WebVTT for web, SCC for broadcast, SRT for general. One stream, multiple caption tracks.

Captions appear automatically

Live captions render in the player within 200ms. VOD captions generate and enter the review queue. Export, embed, or burn in — your choice.

One API for all caption workflows

Start a caption session, get results via webhook or polling. Works with any stream or file.

// Start live captioning
const session = await wave.captions.start({
  streamId: 'str_abc123',
  languages: ['en', 'es', 'fr'],
  speakerDiarization: true,
  format: 'webvtt',
});

// Get captions via webhook
// POST /webhooks/captions
// { "type": "caption.segment", "text": "...", "speaker": "Speaker 1" }

Frequently asked questions

What caption accuracy can I expect?

98.5% for live streams (Deepgram nova-3) and 99.2% for VOD (Cohere Transcribe with post-processing). Speaker identification included at no extra cost. Accuracy improves with custom vocabulary lists.

Can I translate captions in real-time?

Does it meet broadcast caption requirements?

Yes. FCC closed captioning compliant for US broadcast. Exports in SCC (US broadcast), EBU-STL (European broadcast), WebVTT (web), and SRT (general). Automatic compliance reporting for audit trails.

Can I edit captions before publishing?

Yes. Real-time correction editor for live streams. VOD captions go through a review queue with inline editing, bulk find-and-replace, and approval workflow before publishing.

How does the dual-engine system work?

What about Rev.com human captioning?

Available as an optional add-on for content requiring human-level accuracy. 99.9% accuracy with 24-hour turnaround. Ideal for legal depositions, medical transcriptions, and archival content.