Types of confidence scores
There are three types of confidence scores available:1. Sample-level confidence scores
Sample-level confidence scores provide granular confidence values at regular intervals throughout the audio. To include sample-level confidence scores in your diarization or identification results, add"confidence": true to your request body. When enabled, the job output will include:
- The
confidenceobject containing:score, an array with the confidence score for each sampleresolution, indicating the time interval in seconds between confidence score samples
resolution is 0.02, it means each confidence score represents a 20-millisecond interval in the audio.
2. Turn-level confidence scores
Turn-level confidence scores provide confidence values for each diarization segment (turn), making it easier to assess the quality of specific speaker assignments. To include turn-level confidence scores in your results, add"turnLevelConfidence": true to your request body. When enabled, the job output will include a "confidence" object for each diarization segment. The object contains each speaker as the key and a number from 0-100 indicating the confidence score for each speaker assignment.
3. Identification confidence scores
Identification confidence scores are specific to speaker identification tasks and show how well each voiceprint matches each speaker segment. These scores are included automatically when using the identify endpoint. The identification output includes avoiceprints array where each speaker has a confidence object containing the confidence scores for each voiceprint label:
Identification confidence scores are different from diarization confidence scores. They measure how well a voiceprint matches a speaker, not how certain the diarization model is about the speaker assignment.
Enabling confidence scores
By default, confidence scores are not included in the output. You must explicitly enable them in your request:voiceprints section of the output.
Using confidence scores
Quality assessment
Confidence scores can be particularly useful for:- Identifying segments where the model is less certain, which may benefit from manual review
- Filtering results based on a confidence threshold for higher accuracy
- Analyzing the overall reliability of the diarization or identification for a given audio file
Human-in-the-loop correction
Confidence scores enable efficient human-in-the-loop workflows by highlighting segments that need attention:Low confidence detection
Identify segments with confidence scores below your threshold (e.g., < 70) for manual review
Prioritized review
Focus human review time on the most uncertain segments rather than reviewing entire transcripts
Quality control
Use confidence scores as a quality metric to ensure diarization meets your accuracy requirements
Example: Human review workflow with turn-level confidence
Here’s how you might implement a human-in-the-loop correction workflow:confidence_review.py
Example: Identification confidence filtering
For identification tasks, you can filter results based on identification confidence:identification_filter.py
Best practices
- Threshold selection: The optimal confidence threshold depends on your use case. For instance, in critical applications sensitive to false positives, you might want to use a higher threshold.
- Combine multiple types: Using different confidence score types together gives you comprehensive insight - turn-level for segment assessment, sample-level for detailed analysis, and identification confidence for voiceprint matching.
- Performance impact: Enabling confidence scores may slightly increase processing time and output size. Only enable them when you actually need the confidence information.
Interpreting confidence scores
Confidence scores range from 0 to 100, with higher values indicating greater confidence in the model’s predictions. Use these scores to identify segments that may need human review or verification based on your specific accuracy requirements. For identification confidence scores, consider using thematching.threshold parameter in your identification request to automatically filter out low-confidence matches.
By leveraging confidence scores effectively, you can build robust diarization and identification workflows that balance automation with human oversight, ensuring high-quality results while minimizing manual effort.