# Real-time Transcription

## Quick Start

To quickly start using Gowajee's real-time transcription API, you can use the provided GitHub repository example.

**GitHub Example Repository**: [Gowajee Real-time Transcription API Example](https://github.com/Gowajee-ai/gowajee-streaming-api-example)

This repository contains a sample implementation demonstrating how to connect to the WebSocket endpoint, send audio data, and handle responses.

#### Steps to Run the Example:

1. Clone the repository:

   ```sh
   git clone https://github.com/Gowajee-ai/gowajee-streaming-api-example.git
   ```
2. Navigate to the project directory:

   ```sh
   cd gowajee-streaming-api-example
   ```
3. Follow the setup instructions in the repository's README file to install dependencies and configure the example.
4. Run the example script to start sending audio data to the API and receive transcriptions in real time.

By following these steps, you can quickly integrate and test Gowajee's real-time transcription capabilities in your own applications.

***

## Detail

* Protocol: `Websocket`
* Endpoint: `wss://api.gowajee.ai/v1/speech-to-text/pulse/stream/transcribe`

***

## Note

**Supported Audio Format: PCM 16-bit only.**&#x20;

(Most sound libraries default to using PCM 16-bit for audio recording.)

> **Pulse Code Modulation (PCM)** is a method used to digitally represent analog signals. In the context of audio, PCM is a standard format for storing and transmitting uncompressed audio data makes PCM ideal for high-quality audio applications. The "16-bit" in PCM 16-bit refers to the bit depth, which is the number of bits used to represent each audio sample.

***

### Supported Models

| Model | Value |
| ----- | ----- |
| Pulse | pulse |

***

### Headers

<table><thead><tr><th width="141">Name</th><th width="96">Type</th><th width="121">Required</th><th>Description</th></tr></thead><tbody><tr><td>x-api-key</td><td>string</td><td>Yes</td><td>An API key to access the service</td></tr></tbody></table>

***

## Connection

1. Open a WebSocket connection to the provided URL.
2. Send a configuration message in JSON format as the first message after connecting.
3. Send subsequent messages containing raw audio data chunks in the PCM 16-bit format.

***

## Configuration Message

```json
{
  "sampleRate": 16000, // Required: Audio sample rate in Hz (must be 16000 for this endpoint)
  "boostWordList": ["โกวาจี"], // Optional: List of words to boost confidence score (default: empty)
  "boostWordScore": 5 // Optional: Confidence score boost for words in the list (default: 0)
}
```

<table><thead><tr><th width="218">Name</th><th width="142">Type</th><th width="130">Required</th><th>Description</th></tr></thead><tbody><tr><td>sampleRate</td><td>number</td><td>Yes</td><td>Required: Audio sample rate in Hz (must be <code>16000</code> for this endpoint)</td></tr><tr><td>boostWordList</td><td>string[]</td><td>No</td><td>Add specific words to increase the chance of these words appearing in results.<br><br>Read more <a href="/pages/NtrKnqwKRLjg1keTYkjE">details</a>.</td></tr><tr><td>boostScore</td><td>integer</td><td>No</td><td>The number between 1 to 20 to increase the chance of <code>boostWordList</code> appearing in results.<br><br>Read more <a href="/pages/NtrKnqwKRLjg1keTYkjE">details</a>.</td></tr></tbody></table>

***

## Audio Data

* Send raw audio data chunks following the configuration message.
* The optimal chunk size is 0.5 seconds.

***

## Response

The server will respond to each data received. The response will include a speech event corresponding to the received data. If no speech is detected, the event will be "Silent." If speech is detected, the event will be "SpeakOn" and will include a partial transcript. When the speech ends (end of a sentence) or the server finishes the current transcription and starts a new one, the event will be "SpeakOff.”

***

## Response Events

### **Silent**

No speech was detected in the received audio chunk.

#### **Example JSON**

```json
{
  "type": "ASR_PULSE_STREAM",
  "amount": 0.5,
  "output": {
    "event": "Silent",
    "results": null
  }
}
```

***

### **SpeakerOn**

Speech detected, partial transcript provided.

#### **Example JSON**

```json
{
  "type": "ASR_PULSE_STREAM",
  "amount": 14.001,
  "output": {
    "event": "SpeakOn",
    "results": {
      "transcript": "This is the",
      "startTime": 11.0,
      "endTime": 13.001
    },
    "version": "1.0.0",
    "duration": 14.001
  }
}
```

***

### **SpeakOff**

End of the spoken phrase, the final recognized text is provided.

#### **Example JSON**

```json
{
  "type": "ASR_PULSE_STREAM",
  "amount": 16.001,
  "output": {
    "event": "SpeakOff",
    "results": {
      "transcript": "This is the complete sentence",
      "startTime": 11.0,
      "endTime": 16.001
    },
    "version": "1.0.0",
    "duration": 16.001
  }
}
```

***

## Limitations

* This endpoint currently only supports PCM 16-bit audio format.
* The configuration message currently only allows specifying the sample rate, boost word list, and boost word score.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://developers.gowajee.ai/speech-to-text/real-time-transcription.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
