Real-time Transcription

Quick Start

To quickly start using Gowajee's real-time transcription API, you can use the provided GitHub repository example.

GitHub Example Repository:

This repository contains a sample implementation demonstrating how to connect to the WebSocket endpoint, send audio data, and handle responses.

Steps to Run the Example:

Clone the repository:

git clone https://github.com/Gowajee-ai/gowajee-streaming-api-example.git

Navigate to the project directory:
```
cd gowajee-streaming-api-example
```
Follow the setup instructions in the repository's README file to install dependencies and configure the example.
Run the example script to start sending audio data to the API and receive transcriptions in real time.

By following these steps, you can quickly integrate and test Gowajee's real-time transcription capabilities in your own applications.

Detail

Protocol: Websocket
Endpoint: wss://api.gowajee.ai/v1/speech-to-text/pulse/stream/transcribe

Note

Supported Audio Format: PCM 16-bit only.

(Most sound libraries default to using PCM 16-bit for audio recording.)

Pulse Code Modulation (PCM) is a method used to digitally represent analog signals. In the context of audio, PCM is a standard format for storing and transmitting uncompressed audio data makes PCM ideal for high-quality audio applications. The "16-bit" in PCM 16-bit refers to the bit depth, which is the number of bits used to represent each audio sample.

Supported Models

Model

Value

Pulse

pulse

Headers

Name

Type

Required

Description

x-api-key

string

Yes

An API key to access the service

Connection

Open a WebSocket connection to the provided URL.
Send a configuration message in JSON format as the first message after connecting.
Send subsequent messages containing raw audio data chunks in the PCM 16-bit format.

Configuration Message

{
  "sampleRate": 16000, // Required: Audio sample rate in Hz (must be 16000 for this endpoint)
  "boostWordList": ["โกวาจี"], // Optional: List of words to boost confidence score (default: empty)
  "boostWordScore": 5 // Optional: Confidence score boost for words in the list (default: 0)
}

Name

Type

Required

Description

sampleRate

number

Yes

Required: Audio sample rate in Hz (must be 16000 for this endpoint)

boostWordList

string[]

boostScore

integer

Audio Data

Send raw audio data chunks following the configuration message.
The optimal chunk size is 0.5 seconds.

Response

The server will respond to each data received. The response will include a speech event corresponding to the received data. If no speech is detected, the event will be "Silent." If speech is detected, the event will be "SpeakOn" and will include a partial transcript. When the speech ends (end of a sentence) or the server finishes the current transcription and starts a new one, the event will be "SpeakOff.”

Response Events

Silent

No speech was detected in the received audio chunk.

Example JSON

{
  "type": "ASR_PULSE_STREAM",
  "amount": 0.5,
  "output": {
    "event": "Silent",
    "results": null
  }
}

SpeakerOn

Speech detected, partial transcript provided.

Example JSON

{
  "type": "ASR_PULSE_STREAM",
  "amount": 14.001,
  "output": {
    "event": "SpeakOn",
    "results": {
      "transcript": "This is the",
      "startTime": 11.0,
      "endTime": 13.001
    },
    "version": "1.0.0",
    "duration": 14.001
  }
}

SpeakOff

End of the spoken phrase, the final recognized text is provided.

Example JSON

{
  "type": "ASR_PULSE_STREAM",
  "amount": 16.001,
  "output": {
    "event": "SpeakOff",
    "results": {
      "transcript": "This is the complete sentence",
      "startTime": 11.0,
      "endTime": 16.001
    },
    "version": "1.0.0",
    "duration": 16.001
  }
}

Limitations

This endpoint currently only supports PCM 16-bit audio format.
The configuration message currently only allows specifying the sample rate, boost word list, and boost word score.

PreviousChecking Job Status NextSpeaker Separation

Last updated 11 months ago