Skip to main content
When working with audio diarization, you often have prior knowledge about the expected number of speakers or specific requirements for how overlapping speech should be handled. This tutorial covers the key configuration options available in pyannoteAI for speaker detection and exclusive diarization.

Number of speakers

By default, pyannoteAI automatically detects the number of speakers in your audio with no upper limit. However, you can improve accuracy and performance by providing speaker count constraints when you have this information.

Exact speaker count

When you know the exact number of speakers, use numSpeakers for better results. This is common for:
  • Phone conversations (2 speakers)
  • Interviews (2 speakers)
  • Panel discussions with known participants
  • Meeting recordings with known attendees
Setting numSpeakers typically results in better overall diarization performance since the model can optimize for a specific speaker count.

Speaker count ranges

When the exact number is unknown but you have reasonable bounds, use minSpeakers and maxSpeakers:
  • minSpeakers: Minimum number of speakers to detect
  • maxSpeakers: Maximum number of speakers to detect
This is useful when there are optional participants in your recordings, such as:
  • Conference calls with variable attendance
  • Classroom recordings where some students may be absent
  • Broadcast content with variable guest counts

Parameter rules and constraints

  • numSpeakers cannot be used together with minSpeakers or maxSpeakers
  • If both minSpeakers and maxSpeakers are set, minSpeakers must be ≤ maxSpeakers
  • Setting numSpeakers=2 is equivalent to minSpeakers=2 and maxSpeakers=2

Exclusive diarization

By default, diarization results may include overlapping speech segments where multiple speakers are talking simultaneously. While this provides true accurate diarization, some applications require non-overlapping speaker turns. Enable exclusive diarization by setting "exclusive": true. This provides:
  • Non-overlapping segments: Each time period is assigned to exactly one speaker
  • Easier integration: Simpler to combine with speech-to-text or other processing
Exclusive diarization results are provided in the exclusiveDiarization field of the job output, alongside the regular diarization results.

When to use exclusive diarization

Exclusive diarization is particularly useful for:
  • Transcription workflows: Easier to align with ASR output
  • Meeting minutes: Cleaner, more readable summaries
  • Content analysis: Simpler speaker turn analysis
  • Legal proceedings: Clear attribution of speech segments

Best practices

Start with automatic detection

When unsure about speaker count, begin with automatic detection (no parameters) to understand your audio content, then refine with constraints in subsequent processing.

Use exact counts when possible

If you have reliable information about speaker count, always use numSpeakers for optimal performance.

Consider your use case

  • Analysis and research: Use regular diarization to capture natural speech patterns
  • Transcription and documentation: Consider exclusive diarization for cleaner output
  • Unknown number of speakers: Use speaker count ranges to handle variability

Test with your data

Different audio quality and recording conditions may affect how well the constraints work. Test with representative samples from your specific use case.