How to diarize an audio file

To create and receive a diarization using the pyannote API in Python, follow these steps:

Step 1: Authenticate Your API Requests

First, you need to authenticate your API requests using an API key. You can generate your API key from the pyannoteAI dashboard.

Here’s an example of how to authenticate using your API key:

API_KEY = 'your_api_key_here'
headers = {
    'Authorization': f'Bearer {API_KEY}'
}

Step 2: Create Diarization Job

To create a diarization job, you’ll send a POST request to the diarize endpoint with the URL of the audio file and a webhook URL where the results will be sent.

Here’s an example:

import requests

url = "https://api.pyannote.ai/v1/diarize"
API_KEY = "your_api_key_here"
file_url = 'https://example.com/sample.mp3'
webhook_url = 'https://example.com/your-webhook-url'
headers = {
   "Authorization": f"Bearer {API_KEY}"
}
data = {
    'webhook': webhook_url,
    'url': file_url
}
response = requests.post(url, headers=headers, json=data)

print(response.status_code)
# 200

print(response.json()) 
# {
#  "jobId": "bd7e97c9-0742-4a19-bd5a-9df519ce8c74",
#  "message": "Job added to queue",
#  "status": "pending"
# }

This will queue the diarization job and return a jobId. More details can be found here.

Getting an url from a file

Typically files are stored in a cloud storage service such as Amazon S3. If you have a file stored in a cloud storage service, you can generate a public (signed) URL to the file and use that URL in the API request. Make sure the URL is publicly accessible so our servers can access the file.

If you want to manually expose a public file url, you can use the following code, which exposes a file named sample.mp3 with a python Flask server:

from flask import Flask, request, jsonify

app = Flask(__name__)

@app.route('/sample.mp3', methods=['GET'])
def get_audio_file():
    # Replace this with your actual file path
    file_path = '/path/to/sample.mp3'
    return send_file(file_path, mimetype='audio/mpeg')

if __name__ == '__main__':
    app.run(port=5000)

Step 3: Webhook and Receiving the Results

The results of the diarization will be sent to the webhook URL you provided. The webhook payload will look like this:

{
    "jobId": "bd7e97c9-0742-4a19-bd5a-9df519ce8c74",
    "status": "succeeded",
    "output": {
        "diarization": [
            {
                "start": 1.2,
                "end": 3.4,
                "speaker": "SPEAKER_01"
            },
            ...
        ]
    }
}

Make sure your webhook server is set up to handle this JSON payload. For example, using Flask in Python:

from flask import Flask, request, jsonify

app = Flask(__name__)

@app.route('/your-webhook-url', methods=['POST'])
def handle_webhook():
    data = request.json
    print(data)
    # Process the data as needed
    return jsonify({'status': 'received'}), 200

if __name__ == '__main__':
    app.run(port=5000)

You can also use a tool like ngrok to expose your local server to the internet and receive webhooks. Or use a site like webhook.site to test webhooks.

Conclusion

By following these steps, you can create a diarization job, receive the results via webhook, and process them as needed. Make sure to handle errors and edge cases appropriately in your actual implementation.

For more details on how to create a diarization, check out the Diarization API reference.

Getting Started

Tutorials

Support

API Reference

Webhooks

How to diarize an audio file

Step 1: Authenticate Your API Requests

Step 2: Create Diarization Job

Getting an url from a file

Step 3: Webhook and Receiving the Results

Conclusion

Getting Started

Tutorials

Support

API Reference

Webhooks

​Step 1: Authenticate Your API Requests

​Step 2: Create Diarization Job

​Getting an url from a file

​Step 3: Webhook and Receiving the Results

​Conclusion

Step 1: Authenticate Your API Requests

Step 2: Create Diarization Job

Getting an url from a file

Step 3: Webhook and Receiving the Results

Conclusion