Real-time Transcription
Quick Start
To quickly start using Gowajee's real-time transcription API, you can use the provided GitHub repository example.
GitHub Example Repository: Gowajee Real-time Transcription API Example
This repository contains a sample implementation demonstrating how to connect to the WebSocket endpoint, send audio data, and handle responses.
Steps to Run the Example:
Clone the repository:
git clone https://github.com/Gowajee-ai/gowajee-streaming-api-example.git
Navigate to the project directory:
cd gowajee-streaming-api-example
Follow the setup instructions in the repository's README file to install dependencies and configure the example.
Run the example script to start sending audio data to the API and receive transcriptions in real time.
By following these steps, you can quickly integrate and test Gowajee's real-time transcription capabilities in your own applications.
Detail
Protocol:
Websocket
Endpoint:
wss://api.gowajee.ai/v1/speech-to-text/pulse/stream/transcribe
Note
Supported Audio Format: PCM 16-bit only.
(Most sound libraries default to using PCM 16-bit for audio recording.)
Pulse Code Modulation (PCM) is a method used to digitally represent analog signals. In the context of audio, PCM is a standard format for storing and transmitting uncompressed audio data makes PCM ideal for high-quality audio applications. The "16-bit" in PCM 16-bit refers to the bit depth, which is the number of bits used to represent each audio sample.
Supported Models
Pulse
pulse
Headers
x-api-key
string
Yes
An API key to access the service
Connection
Open a WebSocket connection to the provided URL.
Send a configuration message in JSON format as the first message after connecting.
Send subsequent messages containing raw audio data chunks in the PCM 16-bit format.
Configuration Message
{
"sampleRate": 16000, // Required: Audio sample rate in Hz (must be 16000 for this endpoint)
"boostWordList": ["โกวาจี"], // Optional: List of words to boost confidence score (default: empty)
"boostWordScore": 5 // Optional: Confidence score boost for words in the list (default: 0)
}
sampleRate
number
Yes
Required: Audio sample rate in Hz (must be 16000
for this endpoint)
boostWordList
string[]
No
Add specific words to increase the chance of these words appearing in results. Read more details.
boostScore
integer
No
The number between 1 to 20 to increase the chance of boostWordList
appearing in results.
Read more details.
Audio Data
Send raw audio data chunks following the configuration message.
The optimal chunk size is 0.5 seconds.
Response
The server will respond to each data received. The response will include a speech event corresponding to the received data. If no speech is detected, the event will be "Silent." If speech is detected, the event will be "SpeakOn" and will include a partial transcript. When the speech ends (end of a sentence) or the server finishes the current transcription and starts a new one, the event will be "SpeakOff.”
Response Events
Silent
No speech was detected in the received audio chunk.
Example JSON
{
"type": "ASR_PULSE_STREAM",
"amount": 0.5,
"output": {
"event": "Silent",
"results": null
}
}
SpeakerOn
Speech detected, partial transcript provided.
Example JSON
{
"type": "ASR_PULSE_STREAM",
"amount": 14.001,
"output": {
"event": "SpeakOn",
"results": {
"transcript": "This is the",
"startTime": 11.0,
"endTime": 13.001
},
"version": "1.0.0",
"duration": 14.001
}
}
SpeakOff
End of the spoken phrase, the final recognized text is provided.
Example JSON
{
"type": "ASR_PULSE_STREAM",
"amount": 16.001,
"output": {
"event": "SpeakOff",
"results": {
"transcript": "This is the complete sentence",
"startTime": 11.0,
"endTime": 16.001
},
"version": "1.0.0",
"duration": 16.001
}
}
Limitations
This endpoint currently only supports PCM 16-bit audio format.
The configuration message currently only allows specifying the sample rate, boost word list, and boost word score.
Last updated