Submit Identification Job

POST

identify

Identify speaker with diarization

curl --request POST \
  --url https://api.pyannote.ai/v1/identify \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "url": "https://example.com/audio.wav",
  "webhook": "https://example.com/webhook",
  "numSpeakers": 2,
  "matching": {
    "exclusive": true,
    "threshold": 0
  },
  "voiceprints": [
    {
      "label": "John Doe",
      "voiceprint": "U29tZUJhc2U2NERhdGE"
    }
  ]
}'

{
  "jobId": "3c8a89a5-dcc6-4edb-a75d-ffd64739674d",
  "status": "created"
}

This endpoint allows you to create a new diarization with speaker identification from a remote audio URL.

Identification of a remote file (from a URL)

If you have a media file accessible via a URL, you can provide the URL to the file in the request body.

Typically you would use this method with a file stored in a cloud storage service such as Amazon S3. If bucket or object is private, you can follow this tutorial and create a signed url to give temporary access.

For performance reasons, the input audio must respect the following limits:

Maximum duration of 24hs
Maximum file size of 1GiB

Make sure the URL to the file is publicly and directly accessible, otherwise our endpoint won’t be able to read the file.

Some cloud storage services provide an indirect URL with a confirmation page to access the file. Make sure you provide a direct link to the file instead.

Identification of a Local File

If you have a media file stored locally, you can upload it to our temporary media storage and provide the URL to the file in the request body.

Typically you would use this method if you have a file stored on a local machine or device.

Follow the How to upload files tutorial to upload your file.

Number of speakers

By default, pyannoteAI will automatically detect the number of speakers present in the input audio, and there is no limit as to how many speakers can be detected.

It is common to diarize audio where the number of speakers is already known to be 2 (e.g. phone conversations or interviews). If this is your use case, you can add "numSpeakers": 2 to the body of your request. This will detect exactly 2 speakers, which typically results in better overall diarization performance.

Confidence scores

Confidence scores provide a measure of the certainty of the underlying diarization model in its predictions. These scores range from 0 to 100, with higher values indicating greater confidence.

To include confidence scores in your identification results, add "confidence": true to your request body. When enabled, the job output will include:

The confidence object containing:
- score, an array with the confidence score for each sample
- resolution, indicating the time interval in seconds between confidence score samples

For example, if the resolution is 0.02, it means each confidence score represents a 20-millisecond interval in the audio.

Confidence scores do not indicate the confidence of matching between the provided voiceprints and the speaker.

Confidence scores can be particularly useful for:

Identifying segments where the model is less certain, which may benefit from manual review
Filtering results based on a confidence threshold for higher accuracy
Analyzing the overall reliability of the identification for a given audio file

By default, confidence scores are not included in the output. To enable them, you must explicitly set "confidence": true in your request.

Getting results using a webhook

If the optional webhook param is set, the results will be sent to the provided URL. The webhook URL is where any update related to the identification job will be sent. The voiceprint will be sent as a JSON object in the request body.

Make sure the webhook URL is publicly accessible, otherwise our server cannot send the output

Please visit the Receiving webhooks documentation for more information.

Polling for job results

As an alternative to webhooks, you can poll the get job endpoint to retrieve job results.

To poll for job results, use the status field in the job response to check if the job is completed. Once the job is completed, you can retrieve the job results by accessing the output field.

The output field contains the job results and is only available when the job status is succeeded. Job results are automatically deleted after 24 hours of job completion.

A job is completed when the status field is either succeeded, failed, or canceled.

Be cautious of rate limits when polling. Excessive requests to check job status can lead to rate limiting, potentially disrupting your workflow. Check the rate limits documentation for more information.

Authorizations

Authorization

string

header

required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json

Response

200

application/json

The response is of type object.

Submit Voiceprint Job Get Job

Identify speaker with diarization

curl --request POST \
  --url https://api.pyannote.ai/v1/identify \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "url": "https://example.com/audio.wav",
  "webhook": "https://example.com/webhook",
  "numSpeakers": 2,
  "matching": {
    "exclusive": true,
    "threshold": 0
  },
  "voiceprints": [
    {
      "label": "John Doe",
      "voiceprint": "U29tZUJhc2U2NERhdGE"
    }
  ]
}'

{
  "jobId": "3c8a89a5-dcc6-4edb-a75d-ffd64739674d",
  "status": "created"
}

Getting Started

Tutorials

Support

API Reference

Webhooks

Submit Identification Job

Identification of a remote file (from a URL)

Identification of a Local File

Number of speakers

Confidence scores

Getting results using a webhook

Polling for job results

Authorizations

Body

Response

Getting Started

Tutorials

Support

API Reference

Webhooks

​Identification of a remote file (from a URL)

​Identification of a Local File

​Number of speakers

​Confidence scores

​Getting results using a webhook

​Polling for job results

Authorizations

Body

Response

Identification of a remote file (from a URL)

Identification of a Local File

Number of speakers

Confidence scores

Getting results using a webhook

Polling for job results