Synchronous API (Request-Response)
Introduction
This section describes the Synchronous HTTP API for Gowajee's speech-to-text (STT) service. With this API, clients send a request to the server and wait until the STT processing is complete to receive the result. This approach ensures that you receive the transcription data in a single, straightforward response.
Note: At the moment, the Synchronous API supports only the Pulse model.
Workflow
Send Request: The client sends an HTTP request to the Gowajee API endpoint, including the audio data to be transcribed.
Processing: The server processes the audio data using the specified STT model.
Receive Response: The server responds with the transcription result once the processing is complete.
Request
Method:
POST
Endpoint:
https://api.gowajee.ai/v1/speech-to-text/${MODEL}/transcribe
Supported Models
Pulse
pulse
Cosmos
cosmos
Headers
x-api-key
string
Yes
An API key to access the service
Body Parameters
audioData
string
Yes
Content of Audio data in base64 encoded string format or multipart/form-data
getSpeakingRate
boolean
No
Get speaking rate (syllables per second)
getWordTimestamps
boolean
No
Get timestamps for all the words in the transcription. Available only for the Pulse model.
boostWordList
string[]
No
boostScore
integer
No
multichannels
boolean
No
Set multichannels=true
if your audioData
is multichannel audio. This is useful for audio with multiple speakers with multiple channels.
Read more details.
diarization
boolean
No
Set diarization=true
if you want to perform speaker separation with diarization feature.
Read more details.
refSpeakers
RefSpeaker[]
No
The 4-5 seconds of speaker voice for diarization.
Users can upload multiple audio files, and the service will assume each file corresponds to a different speaker. If the service cannot determine which speaker corresponds to a particular transcription, it will label the speaker as 'unknown’. Read more details.
sampleRate
integer
No (Required for Raw Audio Format)
The sample rate represents the number of samples of audio carried per second, measured in Hertz (Hz). It defines how many data points of audio are sampled in one second. Read more about Raw Audio Format.
sampleWidth
integer
No (Required for Raw Audio Format)
The sample width, also known as bit depth, determines the number of bits used to represent each audio sample. It directly affects the dynamic range of the audio signal (1 means 8-bit, 2 means 16-bit, etc). Read more about Raw Audio Format.
channels
integer
No (Required for Raw Audio Format)
Channels refer to the number of independent audio signals or paths in an audio file. Common values are mono (1 channel) and stereo (2 channels). Read more about Raw Audio Format.
Response
{
"type": "ASR_PULSE",
"amount": 4.517,
"output": {
"results": [
{
"transcript": "วันนี้กินอะไรดี",
"startTime": 0,
"endTime": 4.517
}
],
"duration": 4.517,
"version": "2.2.0"
}
}
Last updated