SPEAKER_00, SPEAKER_01, etc.), identification assigns specific identities to speakers.
What are voiceprints?
A voiceprint is a unique digital representation of a person’s voice characteristics, similar to a fingerprint but for voice. It captures the distinctive features of how someone speaks, allowing the system to recognize that person in future audio recordings.Voiceprints are for identification only - they do not improve the accuracy of diarization. Diarization separates speakers, while identification assigns names/labels to those speakers.
Voiceprint requirements
- One voiceprint per speaker: Create only one voiceprint for each person.
- Single speaker only: The recording must contain only the target speaker’s voice with no overlapping speakers.
- Maximum duration: Audio samples must be at most 30 seconds long for creating voiceprints.
- Consistent speaking style: The voiceprint should capture the person’s normal speaking voice.
- Language: Our models are language agnostic, so voiceprints can be created in any spoken language.
Prerequisites
Before you start, you’ll need:- pyannoteAI account with credit or active subscription
- An API key
- An audio recording of a single speaker for the voiceprint creation
- An audio recording with multiple speakers for diarization + identification
1. Create a voiceprint
First, create a voiceprint for each speaker you want to identify. This is a one-time process for each person. Send a POST request to the voiceprint endpoint with an audio file containing the speaker’s voice.create_voiceprint.py
jobId to track the voiceprint creation:
Example response
Get voiceprint results
To retrieve the voiceprint results, use the same polling or webhook approach described in the How to diarize an audio file tutorial. The process works identically for voiceprint jobs.Save voiceprints to your own data storage
- Job outputs (including voiceprints) are automatically deleted after 24 hours.
- Voiceprints are reusable, so store them securely for future identification requests.
example job voiceprint output
2. Identify speakers in audio
Now that you have a voiceprint, you can identify a speaker in new audio recordings. Send a POST request to the identify endpoint with the audio file URL and the voiceprints you want to match against.identify_speakers.py
jobId for tracking the identification job:
Example response
Multiple voiceprints: You can add multiple voiceprints for different people in the same request. Each voiceprint must have a unique label. The system will attempt to match all provided voiceprints against the audio.
3. Get identification results
To retrieve the identification results, use the same polling or webhook approach described in the How to diarize an audio file tutorial. The process works identically for identification jobs.Example identification output
Understanding the Results
Diarization vs Identification
- Diarization: Separates audio into speaker segments with generic labels (SPEAKER_00, SPEAKER_01, etc.)
- Identification: Matches those segments to known voiceprints with specific labels (John Doe, Jane Smith, etc.)
Confidence scores
The confidence scores show how well each voiceprint matches each speaker segment:- Higher scores indicate better matches
- Use the
thresholdparameter to filter out low-confidence matches - Consider the context when interpreting confidence scores
Matching options
matching.threshold: Minimum confidence score required for a match (0-100, default:0). Set higher values (50-70) for more strict matching, lower values for more lenient matching.matching.exclusive: Prevent multiple speakers from matching the same voiceprint (default:true). Set tofalseif you want multiple speakers to potentially match the same voiceprint.