Prerequisites
- Diarization results from pyannoteAI
- Transcript segments from your chosen ASR service
Step 1: Get diarization segments
First, get diarization segments from a diarization job (see how to diarize and Get job). Here is an example of some diarization segments:Example diarization segments
start and end timestamps in seconds along with speaker labels.
Step 2: Get transcript segments with timestamps
Get the transcript segments with timestamps based on the same audio with your chosen ASR service. Here is an example of OpenAIgpt-4o-transcribe and whisper-1 API transcript output, with segment timestamps:
Example OpenAI transcript segments
segments array contains start and end timestamps in seconds along with the transcribed text.
Step 3: Merge results
Combine the diarization segments with the ASR transcript segments by aligning them based on their timestamps. You can use the following Python code taken from WhisperX - diarize.py to achieve this:merge_diarization_asr.py
Merged diarization + ASR segments