> ## Documentation Index
> Fetch the complete documentation index at: https://docs.pyannote.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# How to diarize an audio file

> This tutorial shows how to diarize an audio file using the pyannoteAI API

### Prerequisites

Before you start, you'll need:

* pyannoteAI account with credit or active subscription
* An API key from your dashboard
* A publicly accessible audio file URL

For help creating an account and getting your API key, see the [quickstart guide](/quickstart). For pricing and charging details, see [Billing](/administration/billing).

## 1. Diarize API request

Send a POST request to the diarize endpoint with your audio file URL.

In our example we use a sample audio file hosted on pyannoteAI servers. Its a 79 second recording with two speakers. You may use this url to test the API: `https://files.pyannote.ai/marklex1min.wav`

<Note>
  **The URL must be a direct link to a publicly accessible audio file.** Make sure the URL points directly to the file (e.g., ends with `.wav`, `.mp3`, etc.) and is accessible without authentication.

  Typically, you'll use a signed URL from cloud storage such as AWS S3 buckets or other cloud storage services. **We also offer our own upload file solution.** For details on uploading audio files to our servers, see:

  * [How to upload an audio file](/tutorials/how-to-upload-files)
</Note>

<CodeGroup dropdown>
  ```python diarize.py theme={null}
  import requests

  url = "https://api.pyannote.ai/v1/diarize"
  api_key = "YOUR_API_KEY"  # In production, use environment variables: os.getenv("PYANNOTE_API_KEY")

  headers = {"Authorization": f"Bearer {api_key}", "Content-Type": "application/json"}
  data = {"url": "https://files.pyannote.ai/marklex1min.wav"}

  response = requests.post(url, headers=headers, json=data)

  if response.status_code != 200:
      print(f"Error: {response.status_code} - {response.text}")
  else:
      print(response.json())
  ```

  ```bash theme={null}
  # Set your API key as an environment variable in production
  # export PYANNOTE_API_KEY="your_api_key_here"
  curl -X POST "https://api.pyannote.ai/v1/diarize" \
    -H "Authorization: Bearer YOUR_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{"url": "https://files.pyannote.ai/marklex1min.wav"}'
  ```

  ```typescript diarize.ts theme={null}
  const url = "https://api.pyannote.ai/v1/diarize";
  const apiKey = "YOUR_API_KEY"; // In production, use environment variables: process.env.PYANNOTE_API_KEY
  const headers = {
    Authorization: `Bearer ${apiKey}`,
    "Content-Type": "application/json",
  };
  const data = {
    url: "https://files.pyannote.ai/marklex1min.wav",
  };

  const response = await fetch(url, {
    method: "POST",
    headers,
    body: JSON.stringify(data),
  });

  if (!response.ok) {
    console.error(`Error: ${response.status} - ${await response.text()}`);
  } else {
    console.log(await response.json());
  }
  ```
</CodeGroup>

The response will include a `jobId` that you can use to track the diarization job progress:

```json Example response theme={null}
{
  "jobId": "3c8a89a5-dcc6-4edb-a75d-ffd64739674d",
  "status": "created"
}
```

***

## 2. Get diarization result

Once you have a `jobId`, you can retrieve the results using either polling or using webhooks:

<Warning>
  **Job results are automatically deleted after 24 hours**, for all endpoints.
  Make sure to save your results in your own database.
</Warning>

### <Icon icon="repeat" /> Polling

Poll the [get job](/api-reference/get-job) endpoint to check job status and retrieve results when complete.

<Note>
  Be cautious of rate limits when polling. Excessive requests can lead to rate
  limiting. **In production, we strongly recommend using webhooks instead.**
</Note>

<CodeGroup dropdown>
  ```python polling.py theme={null}
  import time

  api_key = "YOUR_API_KEY"  # In production, use environment variables: os.getenv("PYANNOTE_API_KEY")
  headers = {"Authorization": f"Bearer {api_key}"}

  while True:
      response = requests.get(
          f"https://api.pyannote.ai/v1/jobs/{job_id}", headers=headers
      )

      if response.status_code != 200:
          print(f"Error: {response.status_code} - {response.text}")
          break

      data = response.json()
      status = data["status"]

      if status in ["succeeded", "failed", "canceled"]:
          if status == "succeeded":
              print("Job completed successfully!")
              print(data["output"])
          else:
              print(f"Job {status}")
          break

      print(f"Job status: {status}, waiting...")
      time.sleep(10)  # Wait 10 seconds before polling again
  ```

  ```bash theme={null}
  # Set your API key as an environment variable in production
  # export PYANNOTE_API_KEY="your_api_key_here"
  # Poll for job status
  curl -H "Authorization: Bearer YOUR_API_KEY" \
    "https://api.pyannote.ai/v1/jobs/YOUR_JOB_ID"

  # Keep polling until status is "succeeded", "failed", or "canceled"
  ```

  ```typescript polling.ts theme={null}
  const jobId = "YOUR_JOB_ID"; // From the diarize response
  const apiKey = "YOUR_API_KEY"; // In production, use environment variables: process.env.PYANNOTE_API_KEY
  const headers = {
    Authorization: `Bearer ${apiKey}`,
  };

  async function pollJob() {
    while (true) {
      const response = await fetch(`https://api.pyannote.ai/v1/jobs/${jobId}`, {
        headers,
      });

      if (!response.ok) {
        console.error(`Error: ${response.status} - ${await response.text()}`);
        break;
      }

      const data = await response.json();
      const status = data.status;

      if (["succeeded", "failed", "canceled"].includes(status)) {
        if (status === "succeeded") {
          console.log("Job completed successfully!");
          console.log(data.output);
        } else {
          console.log(`Job ${status}`);
        }
        break;
      }

      console.log(`Job status: ${status}, waiting...`);
      await new Promise((resolve) => setTimeout(resolve, 10000)); // Wait 10 seconds
    }
  }

  pollJob();
  ```
</CodeGroup>

### <Icon icon="webhook" /> Webhook

Specify a webhook URL when creating the diarization job to receive updates automatically when the job reaches a terminal status.

<Info>
  Webhooks are sent for terminal statuses only: `succeeded`, `failed`, and
  `canceled`. They are not sent for `pending`, `created`, or `running`.

  For `failed` and `canceled` jobs, payloads include `jobId` and `status`
  (without `output`).
</Info>

#### 1. Specify your webhook URL

Add the `webhook` parameter to your diarization request payload.
If you only need status updates (useful for smaller payloads), set
`webhookStatusOnly` to `true` (default is `false`):

<CodeGroup dropdown>
  ```python diarize_with_webhook.py theme={null}
  data = {
      "url": "https://files.pyannote.ai/marklex1min.wav",
      "webhook": "https://your-server.com/webhook"
  }
  ```

  ```bash theme={null}
  -d '{
  "url": "https://files.pyannote.ai/marklex1min.wav",
  "webhook": "https://your-server.com/webhook"
  }'
  ```

  ```typescript diarize_with_webhook.ts theme={null}
  const data = {
    url: "https://files.pyannote.ai/marklex1min.wav",
    webhook: "https://your-server.com/webhook"
  };
  ```
</CodeGroup>

#### 2. Create server exposing webhook endpoint

Here we show a simple example of how to expose a server that accepts the webhook POST requests. You can use any web framework of your choice.

<CodeGroup dropdown>
  ```python webhook.py theme={null}
  from flask import Flask, request, jsonify

  app = Flask(__name__)

  @app.route('/webhook', methods=['POST'])
  def handle_webhook():
      data = request.json
      status = data.get('status')

      if status == 'succeeded':
          print("Diarization completed!")
          print("Job ID:", data['jobId'])
          if 'output' in data:
              print("Results:", data['output'])

      if status == 'failed':
          print("Job failed.")

      if status == 'canceled':
          print("Job canceled.")

      return jsonify({'status': 'received'}), 200

  if __name__ == '__main__':
      app.run(port=5000)
  ```

  ```typescript webhook.ts theme={null}
  import express from 'express';

  const app = express();
  app.use(express.json());

  app.post('/webhook', (req, res) => {
      const data = req.body;
      const status = data.status;

      if (status === 'succeeded') {
          console.log('Diarization completed!');
          console.log('Job ID:', data.jobId);
          if (data.output) {
              console.log('Results:', data.output);
          }
      }

      if (status === 'failed') {
          console.log('Job failed.');
      }

      if (status === 'canceled') {
          console.log('Job canceled.');
      }

      res.json({ status: 'received' });
  });

  app.listen(5000, () => {
      console.log('Server running on port 5000');
  });
  ```
</CodeGroup>

<Tip>
  You can also use a tool like [ngrok](https://ngrok.com/) to expose your local server to the internet
  for testing webhooks, or use [webhook.site](https://webhook.site/) for quick
  testing.
</Tip>

**Learn more about webhooks:**

* [Receiving webhooks](/webhooks/receiving-webhooks) - Learn about webhook payloads, retries, and failure codes
* [Verifying webhooks](/webhooks/verifying-webhooks) - Learn how to verify webhook signatures to ensure requests are from pyannoteAI
