# Diarization

The **Diarization** method automatically detects and separates speakers within a single audio channel. This method can handle more complex audio inputs where multiple speakers are mixed in the same channel. However, it is significantly slower, **taking 2-3 times longer to process** compared to transcription without diarization.

***

### Key Features

* **Use Case**: Suitable for mixed audio inputs where speakers are not on separate channels.
* **Accuracy**: Variable, dependent on the complexity of the audio and number of speakers.
* **Performance**: Slow, taking 2-3 times longer than non-diarized processing.
* **Recommendation**: Use only when [Multichannel](/speech-to-text/speaker-separation/multichannel.md) separation is not feasible.

***

## Transcribe with Diarization

### Introduction

This section describes the various ways to use diarization with the Gowajee speech-to-text (STT) API. Diarization is the process of identifying and separating speakers within an audio input. Below are the options for configuring diarization to best suit your needs.

### Methods to Enable Diarization

1. [Enable Automatic Diarization](#enable-automatic-diarization)
2. [Specify Number of Speakers](#specify-number-of-speakers)
3. [Define Range of Speakers](#define-range-of-speakers)
4. [Reference Speakers (Recommended)](#reference-speakers-recommended)

***

### Enable Automatic Diarization

Set `diarization` to `true` to enable automatic detection of the number of speakers and speaker separation.

**Request:**

```http
POST /v1/speech-to-text/${MODEL}/transcribe HTTP/1.1
Host: api.gowajee.ai
Content-Type: application/json
X-Api-Key: ${YOUR_API_KEY}

{
  "audioData": "base64_encoded_raw_audio_data",
  "diarization": true
}

```

**Response:**

```json
{
    "type": "ASR_PULSE_WITH_DIARIZE",
    "amount": 6.248,
    "output": {
        "results": [
            {
                "transcript": "สวัสดีค่ะก่อนอื่นทางเราขอให้คุณยืนยันตัวตนผ่านระบบเสียง",
                "startTime": 0,
                "endTime": 5.117,
                "speaker": "SPEAKER_00"
            },
            {
                "transcript": "ได้เลยครับ",
                "startTime": 5.231,
                "endTime": 6.248,
                "speaker": "SPEAKER_01"
            }
        ],
        "duration": 6.248,
        "version": "2.2.0"
    }
}
```

***

### Specify Number of Speakers

Set `diarization` to `true` and define `numSpeakers` (Integer) to specify the number of speakers in the audio.

**Request:**

```http
POST /v1/speech-to-text/${MODEL}/transcribe HTTP/1.1
Host: api.gowajee.ai
Content-Type: application/json
X-Api-Key: ${YOUR_API_KEY}

{
  "audioData": "base64_encoded_raw_audio_data",
  "diarization": true,
  "numSpeakers": 2
}

```

**Response:**

```json
{
    "type": "ASR_PULSE_WITH_DIARIZE",
    "amount": 6.248,
    "output": {
        "results": [
            {
                "transcript": "สวัสดีค่ะก่อนอื่นทางเราขอให้คุณยืนยันตัวตนผ่านระบบเสียง",
                "startTime": 0,
                "endTime": 5.117,
                "speaker": "SPEAKER_00"
            },
            {
                "transcript": "ได้เลยครับ",
                "startTime": 5.231,
                "endTime": 6.248,
                "speaker": "SPEAKER_01"
            }
        ],
        "duration": 6.248,
        "version": "2.2.0"
    }
}
```

***

### Define Range of Speakers

Set `diarization` to `true` and define `minSpeakers` (Integer) and `maxSpeakers` (Integer) to automatically detect and separate the number of speakers within the specified range.

**Request:**

```http
POST /v1/speech-to-text/${MODEL}/transcribe HTTP/1.1
Host: api.gowajee.ai
Content-Type: application/json
X-Api-Key: ${YOUR_API_KEY}

{
  "audioData": "base64_encoded_raw_audio_data",
  "diarization": true,
  "minSpeakers": 1,
  "maxSpeakers": 2,
}

```

**Response:**

```json
{
    "type": "ASR_PULSE_WITH_DIARIZE",
    "amount": 6.248,
    "output": {
        "results": [
            {
                "transcript": "สวัสดีค่ะก่อนอื่นทางเราขอให้คุณยืนยันตัวตนผ่านระบบเสียง",
                "startTime": 0,
                "endTime": 5.117,
                "speaker": "SPEAKER_00"
            },
            {
                "transcript": "ได้เลยครับ",
                "startTime": 5.231,
                "endTime": 6.248,
                "speaker": "SPEAKER_01"
            }
        ],
        "duration": 6.248,
        "version": "2.2.0"
    }
}
```

***

### Reference Speakers (Recommended)

Define `refSpeakers` with reference to speaker voices and names. This method improves accuracy by using known speaker samples.

**Request:**

* application/json

```http
POST /v1/speech-to-text/${MODEL}/transcribe HTTP/1.1
Host: api.gowajee.ai
Content-Type: application/json
X-Api-Key: ${YOUR_API_KEY}

{
  "audioData": "base64_encoded_raw_audio_data",
  "diarization": true,
  "refSpeakers": [
    {
      "name": "Adam",
      "audioData": "base64_encoded_voice_of_adam"
    },
    {
      "name": "Bill",
      "audioData": "base64_encoded_voice_of_bill"
    }
  ]
}

```

* multipart-form/data

For file uploads using multipart/form-data, the API will use the filename as the speaker name.

```http
POST /v1/speech-to-text/${MODEL}/transcribe HTTP/1.1
Host: api.gowajee.ai
x-api-key: ${YOUR_API_KEY}
Content-Length: ${AUTO}
Content-Type: multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gW

------WebKitFormBoundary7MA4YWxkTrZu0gW
Content-Disposition: form-data; name="audioData"; filename="audio.wav"
Content-Type: audio/wav

(data)
------WebKitFormBoundary7MA4YWxkTrZu0gW
Content-Disposition: form-data; name="refSpeakers"; filename="Adam.wav"
Content-Type: audio/wav

(data)
------WebKitFormBoundary7MA4YWxkTrZu0gW
Content-Disposition: form-data; name="refSpeakers"; filename="Bill.wav"
Content-Type: audio/wav

(data)
------WebKitFormBoundary7MA4YWxkTrZu0gW

```

**Response:**

```json
{
    "type": "ASR_PULSE_WITH_DIARIZE",
    "amount": 6.248,
    "output": {
        "results": [
            {
                "transcript": "สวัสดีครับก่อนอื่นทางเราขอให้คุณยืนยันตัวตนผ่านระบบเสียง",
                "startTime": 0,
                "endTime": 5.117,
                "speaker": "Adam"
            },
            {
                "transcript": "ได้เลยครับ",
                "startTime": 5.231,
                "endTime": 6.248,
                "speaker": "Bill"
            }
        ],
        "duration": 6.248,
        "version": "2.2.0"
    }
}
```

***


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://developers.gowajee.ai/speech-to-text/speaker-separation/diarization.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
