Gowajee API
  • Introduction
  • Getting Started
  • Speech to text
    • Models
    • Limitations
      • Rate Limit
      • Astro Model Limitation
      • Word Timestamp
      • Words Boosting
      • Audio Duration Limit
      • Handling Large Files
      • HTTP Multipart/Form-Data
      • Supported File Formats
    • Transcription
      • Synchronous API (Request-Response)
      • Asynchronous API (Webhook Notification)
        • Checking Job Status
    • Real-time Transcription
    • Speaker Separation
      • Multichannel
      • Diarization
    • Words Boosting
    • Raw Audio Format
  • Pricing
Powered by GitBook
On this page
  • Quick Start
  • Detail
  • Note
  • Supported Models
  • Headers
  • Connection
  • Configuration Message
  • Audio Data
  • Response
  • Response Events
  • Silent
  • SpeakerOn
  • SpeakOff
  • Limitations
  1. Speech to text

Real-time Transcription

PreviousChecking Job StatusNextSpeaker Separation

Last updated 11 months ago

Quick Start

To quickly start using Gowajee's real-time transcription API, you can use the provided GitHub repository example.

GitHub Example Repository:

This repository contains a sample implementation demonstrating how to connect to the WebSocket endpoint, send audio data, and handle responses.

Steps to Run the Example:

  1. Clone the repository:

    git clone https://github.com/Gowajee-ai/gowajee-streaming-api-example.git
  2. Navigate to the project directory:

    cd gowajee-streaming-api-example
  3. Follow the setup instructions in the repository's README file to install dependencies and configure the example.

  4. Run the example script to start sending audio data to the API and receive transcriptions in real time.

By following these steps, you can quickly integrate and test Gowajee's real-time transcription capabilities in your own applications.


Detail

  • Protocol: Websocket

  • Endpoint: wss://api.gowajee.ai/v1/speech-to-text/pulse/stream/transcribe


Note

Supported Audio Format: PCM 16-bit only.

(Most sound libraries default to using PCM 16-bit for audio recording.)

Pulse Code Modulation (PCM) is a method used to digitally represent analog signals. In the context of audio, PCM is a standard format for storing and transmitting uncompressed audio data makes PCM ideal for high-quality audio applications. The "16-bit" in PCM 16-bit refers to the bit depth, which is the number of bits used to represent each audio sample.


Supported Models

Model
Value

Pulse

pulse


Headers

Name
Type
Required
Description

x-api-key

string

Yes

An API key to access the service


Connection

  1. Open a WebSocket connection to the provided URL.

  2. Send a configuration message in JSON format as the first message after connecting.

  3. Send subsequent messages containing raw audio data chunks in the PCM 16-bit format.


Configuration Message

{
  "sampleRate": 16000, // Required: Audio sample rate in Hz (must be 16000 for this endpoint)
  "boostWordList": ["โกวาจี"], // Optional: List of words to boost confidence score (default: empty)
  "boostWordScore": 5 // Optional: Confidence score boost for words in the list (default: 0)
}
Name
Type
Required
Description

sampleRate

number

Yes

Required: Audio sample rate in Hz (must be 16000 for this endpoint)

boostWordList

string[]

No

boostScore

integer

No


Audio Data

  • Send raw audio data chunks following the configuration message.

  • The optimal chunk size is 0.5 seconds.


Response

The server will respond to each data received. The response will include a speech event corresponding to the received data. If no speech is detected, the event will be "Silent." If speech is detected, the event will be "SpeakOn" and will include a partial transcript. When the speech ends (end of a sentence) or the server finishes the current transcription and starts a new one, the event will be "SpeakOff.”


Response Events

Silent

No speech was detected in the received audio chunk.

Example JSON

{
  "type": "ASR_PULSE_STREAM",
  "amount": 0.5,
  "output": {
    "event": "Silent",
    "results": null
  }
}

SpeakerOn

Speech detected, partial transcript provided.

Example JSON

{
  "type": "ASR_PULSE_STREAM",
  "amount": 14.001,
  "output": {
    "event": "SpeakOn",
    "results": {
      "transcript": "This is the",
      "startTime": 11.0,
      "endTime": 13.001
    },
    "version": "1.0.0",
    "duration": 14.001
  }
}

SpeakOff

End of the spoken phrase, the final recognized text is provided.

Example JSON

{
  "type": "ASR_PULSE_STREAM",
  "amount": 16.001,
  "output": {
    "event": "SpeakOff",
    "results": {
      "transcript": "This is the complete sentence",
      "startTime": 11.0,
      "endTime": 16.001
    },
    "version": "1.0.0",
    "duration": 16.001
  }
}

Limitations

  • This endpoint currently only supports PCM 16-bit audio format.

  • The configuration message currently only allows specifying the sample rate, boost word list, and boost word score.

Add specific words to increase the chance of these words appearing in results. Read more .

The number between 1 to 20 to increase the chance of boostWordList appearing in results. Read more .

Gowajee Real-time Transcription API Example
details
details