Diarize a remote file (from a URL)
If you have a media file accessible via a URL, you can provide the URL to the file in the request body. Typically you would use this method with a file stored in a cloud storage service such as Amazon S3. If bucket or object is private, you can follow this tutorial and create a signed url to give temporary access. For performance reasons, the input audio must respect the following limits:- Maximum duration of 24hs
- Maximum file size of 1GiB
Diarization of a Local File
If you have a media file stored locally, you can upload it to our temporary media storage and provide the URL to the file in the request body. Typically you would use this method if you have a file stored on a local machine or device. Follow the How to upload files tutorial to upload your file.Choosing models
As new models are released, you can specify which model to use by adding themodel
parameter to your request body.
The model
parameter allows you to select a specific model, which can be useful for optimizing performance or accuracy based on your specific use case.
The available models are:
precision-1
: The current precision model (default)precision-2
: Our latest precision model (37% better speaker assignment)
precision-2
, as it will become the new default in the near future.precision-1
and precision-2
are fully compatible. You can use voiceprints from either model interchangeably for identification tasks.Number of speakers
By default, pyannoteAI will automatically detect the number of speakers present in the input audio, and there is no limit as to how many speakers can be detected. It is common to diarize audio where the number of speakers is already known to be 2 (e.g. phone conversations or interviews). If this is your use case, you can add"numSpeakers": 2
to the body of your request.
This will detect exactly 2 speakers, which typically results in better overall diarization performance.
If the number of speakers is unknown but falls within a certain range, or if you want to set a limit,
you can use the “minSpeakers” and “maxSpeakers” parameters. For example, specifying “minSpeakers”: 2,
“maxSpeakers”: 4 will restrict detection to between two and four speakers.
numSpeakers=2
is equivalent to minSpeakers=2
and maxSpeakers=2
.numSpeakers
can’t be used together with minSpeakers
or maxSpeakers
.If both minSpeakers
and maxSpeakers
are set, minSpeakers
must be less than or equal to maxSpeakers
.Diarization without overlapping speech
If you need results where speech from different speakers does not overlap, you can enable exclusive diarization by adding"exclusive": true
to your request body.
This will include exclusive diarization values in the output (equivalent to diarization but without overlapping speech).
Confidence Scores
Confidence scores provide a measure of the certainty of the diarization model in its predictions. These scores range from 0 to 100, with higher values indicating greater confidence. To include confidence scores in your diarization results, add"confidence": true
to your request body. When enabled, the job output will include:
- The
confidence
object containing:score
, an array with the confidence score for each sampleresolution
, indicating the time interval in seconds between confidence score samples
resolution
is 0.02
, it means each confidence score represents a 20-millisecond interval in the audio.
Confidence scores can be particularly useful for:
- Identifying segments where the model is less certain, which may benefit from manual review
- Filtering results based on a confidence threshold for higher accuracy
- Analyzing the overall reliability of the diarization for a given audio file
"confidence": true
in your request.
Turn-level confidence scores
To include turn-level confidence scores in your results, add"turnLevelConfidence": true
to your request body.
When enabled, the job output will include a "confidence"
object for each diarization segment.
The object contains each speaker as the key and a number from 0-100 indicating the confidence score for each speaker assignment.
Getting results using a webhook
If the optionalwebhook
param is set, the results will be sent to the provided URL.
The webhook URL is where any update related to the diarization job will be sent.
The job output will be sent as a JSON object in the request body.
Polling for job results
As an alternative to webhooks, you can poll the get job endpoint to retrieve job results. To poll for job results, use thestatus
field in the job response to check if the job is completed.
Once the job is completed, you can retrieve the job results by accessing the output
field.
output
field contains the job results and is only available when the job
status is succeeded
. Job results are automatically deleted after 24
hours of job completion.status
field is either succeeded
, failed
, or canceled
.
Authorizations
Bearer authentication header of the form Bearer <token>
, where <token>
is your auth token.
Body
URL of the audio file to be processed
"https://example.com/audio.wav"
Webhook URL to receive results when job is completed (optional)
"https://example.com/webhook"
precision-1
, precision-2
"precision-2"
Number of speakers. Only use if the number of speakers is known in advance. Number of speakers is detected automatically if not provided. Setting this value results in better overall diarization performance. In rare cases where we cannot honor this request (e.g. short files and large number of speakers), a warning will be added to the output. Equivalent to sending minSpeakers==maxSpeakers
x >= 1
2
Minimum number of speakers (must be <= maxSpeakers if both are set)
x >= 1
1
Maximum number of speakers (must be >= minSpeakers if both are set)
x >= 1
4
Includes turn-level confidence values in the output.
true
Includes exclusive diarization values in the output (equivalent to diarization but without overlapping speech).
true
Include confidence values in the output
true