Submit Identification Job
This endpoint allows you to create a new diarization with speaker identification from a remote audio URL.
Identification of a remote file (from a URL)
If you have a media file accessible via a URL, you can provide the URL to the file in the request body with the header Content-Type
set to application/json
.
Typically you would use this method with a file stored in a cloud storage service such as Amazon S3.
For performance reasons, the input audio must respect the following limits:
- Maximum duration of 24hs
- Maximum file size of 1GiB
Number of speakers
By default, pyannoteAI will automatically detect the number of speakers present in the input audio, and there is no limit as to how many speakers can be detected.
It is common to diarize audio where the number of speakers is already known to be 2 (e.g. phone conversations or interviews).
If this is your use case, you can add "numSpeakers": 2
to the body of your request.
This will detect exactly 2 speakers, which typically results in better overall diarization performance.
Confidence scores
Confidence scores provide a measure of the certainty of the underlying diarization model in its predictions. These scores range from 0 to 100, with higher values indicating greater confidence.
To include confidence scores in your identification results, add "confidence": true
to your request body. When enabled, the job output will include:
- The
confidence
object containing:score
, an array with the confidence score for each sampleresolution
, indicating the time interval in seconds between confidence score samples
For example, if the resolution
is 0.02
, it means each confidence score represents a 20-millisecond interval in the audio.
Confidence scores do not indicate the confidence of matching between the provided voiceprints and the speaker.
Confidence scores can be particularly useful for:
- Identifying segments where the model is less certain, which may benefit from manual review
- Filtering results based on a confidence threshold for higher accuracy
- Analyzing the overall reliability of the identification for a given audio file
By default, confidence scores are not included in the output. To enable them, you must explicitly set "confidence": true
in your request.
Receiving the results (Webhook)
The webhook URL is where the finished identification segments will be sent. The voiceprint will be sent as a JSON object in the request body.
Please visit the Webhook documentation for more information on what the webhook sends.
Authorizations
Bearer authentication header of the form Bearer <token>
, where <token>
is your auth token.
Body
URL of the audio file to be diarized with identification
List of voiceprints to identify against
Webhook URL to receive identification results
Number of speakers, 1 or 2. Only use if the number of speakers is known in advance. Number of speakers is detected automatically if not provided.
1 < x < 2
Response
ID of the job
Status of the job
pending
, created
, succeeded
, canceled
, failed
, running