Multichannel

The Multichannel method separates speakers based on each channel of the audio input. This is particularly useful in scenarios such as customer service calls, where different channels represent different speakers (e.g., channel 1 for the agent and channel 2 for the customer).

This method is highly recommended over Diarization for its accuracy and efficiency.


Key Features

  • Use Case: Ideal for structured audio inputs like customer service recordings, where different channels represent different speakers (e.g., channel 1 for the agent and channel 2 for the customer).

  • Accuracy: High, as it directly uses separate channels for each speaker.

  • Performance: Efficient and fast.

  • Recommendation: Highly recommended over Diarization for applications where audio channels are clearly separated.


Transcribing Multichannel Audio

Required Parameters

  1. multichannel (boolean): Set multichannel to true to enable identify speakers based on each channel of the audio input.

Example Request

POST /v1/speech-to-text/${MODEL}/transcribe HTTP/1.1
Host: api.gowajee.ai
Content-Type: application/json
X-Api-Key: ${YOUR_API_KEY}

{
  "audioData": "base64_encoded_raw_audio_data",
  "multichannel": true
}

Example Response

{
  "type": "ASR_PULSE",
  "amount": 4.517,
  "output": {
    "results": [
      {
        "transcript": "วันนี้กินอะไรดี",
        "startTime": 0,
        "endTime": 4.517,
        "channel": 0
      },
      {
        "transcript": "พรุ่งนี้กินอะไรดี",
        "startTime": 0,
        "endTime": 4.517,
        "channel": 1
      }
    ],
    "duration": 4.517,
    "version": "2.2.0"
  }
}

Last updated