Speaker diarization
Automatically detect each speaker in multi-speaker audio recordings.Example diarization output
num_speakers: Expected number of speakers, leave empty for automatic detectionmin_speakers/max_speakers: Range for speaker detectionexclusive: Enable exclusive diarization mode, equivalent to diarization but without overlapping speech. Useful for easier reconciliation with STT/ASR results.model: Choose diarization modelconfidence: Include confidence scores
Speaker Identification vs. Diarization
Diarization answers “who spoke when?” with generic labels (SPEAKER_00, SPEAKER_01, etc.).
Identification answers “who is speaking?” by recognizing specific known voices using voiceprints.
Voiceprint
Captures a speaker’s voice to identify that person in other audio recordings. Best practices:- Use clear, high-quality audio (max 30 seconds)
- One voiceprint per speaker
Confidence scores
Receive confidence scores for each speaker segment to assess reliability and perform human in the loop correction. Set theconfidence parameter to true in your diarization or identification request.
Understanding confidence scores
Overlapped speech detection
Detect when multiple speakers talk over each other and attribute overlapping speech to the correct speakers. Find overlapping speech by comparing timestamps of segments from different speakers. For example:Example diarization output
SPEAKER_00 and SPEAKER_01 are talking between 12.5-14.0 seconds.
You can also use the segment timestamps to calculate statistics such as total
speaker time per speaker, total overlap duration, and percentage of overlapped
speech, etc.
STT Orchestration: Speaker-attributed transcripts
We host open-source transcription models like Nvidia Parakeet-tdt-0.6b-v3 with specialized STT + diarization reconciliation logic for speaker-attributed transcripts. To use this feature, make a request to the diarize API endpoint with thetranscription: true flag.
Learn more about speech to text with diarization
Already have your own transcript? Merge it with our diarization results using this tutorial.