Real-time Transcription
Last updated
Last updated
To quickly start using Gowajee's real-time transcription API, you can use the provided GitHub repository example.
GitHub Example Repository:
This repository contains a sample implementation demonstrating how to connect to the WebSocket endpoint, send audio data, and handle responses.
Clone the repository:
Navigate to the project directory:
Follow the setup instructions in the repository's README file to install dependencies and configure the example.
Run the example script to start sending audio data to the API and receive transcriptions in real time.
By following these steps, you can quickly integrate and test Gowajee's real-time transcription capabilities in your own applications.
Protocol: Websocket
Endpoint: wss://api.gowajee.ai/v1/speech-to-text/pulse/stream/transcribe
Supported Audio Format: PCM 16-bit only.
(Most sound libraries default to using PCM 16-bit for audio recording.)
Pulse Code Modulation (PCM) is a method used to digitally represent analog signals. In the context of audio, PCM is a standard format for storing and transmitting uncompressed audio data makes PCM ideal for high-quality audio applications. The "16-bit" in PCM 16-bit refers to the bit depth, which is the number of bits used to represent each audio sample.
Pulse
pulse
x-api-key
string
Yes
An API key to access the service
Open a WebSocket connection to the provided URL.
Send a configuration message in JSON format as the first message after connecting.
Send subsequent messages containing raw audio data chunks in the PCM 16-bit format.
sampleRate
number
Yes
Required: Audio sample rate in Hz (must be 16000
for this endpoint)
boostWordList
string[]
No
boostScore
integer
No
Send raw audio data chunks following the configuration message.
The optimal chunk size is 0.5 seconds.
The server will respond to each data received. The response will include a speech event corresponding to the received data. If no speech is detected, the event will be "Silent." If speech is detected, the event will be "SpeakOn" and will include a partial transcript. When the speech ends (end of a sentence) or the server finishes the current transcription and starts a new one, the event will be "SpeakOff.”
No speech was detected in the received audio chunk.
Speech detected, partial transcript provided.
End of the spoken phrase, the final recognized text is provided.
This endpoint currently only supports PCM 16-bit audio format.
The configuration message currently only allows specifying the sample rate, boost word list, and boost word score.
Add specific words to increase the chance of these words appearing in results. Read more .
The number between 1 to 20 to increase the chance of boostWordList
appearing in results.
Read more .