Gowajee API
  • Introduction
  • Getting Started
  • Speech to text
    • Models
    • Limitations
      • Rate Limit
      • Astro Model Limitation
      • Word Timestamp
      • Words Boosting
      • Audio Duration Limit
      • Handling Large Files
      • HTTP Multipart/Form-Data
      • Supported File Formats
    • Transcription
      • Synchronous API (Request-Response)
      • Asynchronous API (Webhook Notification)
        • Checking Job Status
    • Real-time Transcription
    • Speaker Separation
      • Multichannel
      • Diarization
    • Words Boosting
    • Raw Audio Format
  • Pricing
Powered by GitBook
On this page
  • Introduction
  • Workflow
  • Request
  • Supported Models
  • Headers
  • Body Parameters
  • Response
  1. Speech to text
  2. Transcription

Synchronous API (Request-Response)

PreviousTranscriptionNextAsynchronous API (Webhook Notification)

Last updated 8 months ago

Introduction

This section describes the Synchronous HTTP API for Gowajee's speech-to-text (STT) service. With this API, clients send a request to the server and wait until the STT processing is complete to receive the result. This approach ensures that you receive the transcription data in a single, straightforward response.

Note: At the moment, the Synchronous API supports only the model.

Workflow

  1. Send Request: The client sends an HTTP request to the Gowajee API endpoint, including the audio data to be transcribed.

  2. Processing: The server processes the audio data using the specified STT model.

  3. Receive Response: The server responds with the transcription result once the processing is complete.


Request

  • Method: POST

  • Endpoint: https://api.gowajee.ai/v1/speech-to-text/${MODEL}/transcribe

Supported Models

Model
Value

Pulse

pulse

Cosmos

cosmos

Headers

Name
Type
Required
Description

x-api-key

string

Yes

An API key to access the service

Body Parameters

Name
Type
Required
Description

audioData

string

Yes

Content of Audio data in base64 encoded string format or multipart/form-data

getSpeakingRate

boolean

No

Get speaking rate (syllables per second)

getWordTimestamps

boolean

No

boostWordList

string[]

No

boostScore

integer

No

multichannels

boolean

No

diarization

boolean

No

numSpeakers

integer

No

minSpeakers

integer

No

Minimum number of speakers in your audioData

maxSpeakers

integer

No

refSpeakers

RefSpeaker[]

No

The 4-5 seconds of speaker voice for diarization.

sampleRate

integer

sampleWidth

integer

channels

integer

Response

{
  "type": "ASR_PULSE",
  "amount": 4.517,
  "output": {
    "results": [
      {
        "transcript": "วันนี้กินอะไรดี",
        "startTime": 0,
        "endTime": 4.517
      }
    ],
    "duration": 4.517,
    "version": "2.2.0"
  }
}

Get timestamps for all the words in the transcription. Available only for the model.

Add specific words to increase the chance of these words appearing in results. Available only for the and models. Read more .

The number between 1 to 20 to increase the chance of boostWordList appearing in results. Available only for the and models. Read more .

Set multichannels=true if your audioData is multichannel audio. This is useful for audio with multiple speakers with multiple channels. Read more .

Set diarization=true if you want to perform speaker separation with diarization feature. Read more .

Number of speakers in your audioData Read more .

Read more .

Maximum number of speakers in your audioData Read more .

Users can upload multiple audio files, and the service will assume each file corresponds to a different speaker. If the service cannot determine which speaker corresponds to a particular transcription, it will label the speaker as 'unknown’. Read more .

No (Required for )

The sample rate represents the number of samples of audio carried per second, measured in Hertz (Hz). It defines how many data points of audio are sampled in one second. Read more about .

No (Required for )

The sample width, also known as bit depth, determines the number of bits used to represent each audio sample. It directly affects the dynamic range of the audio signal (1 means 8-bit, 2 means 16-bit, etc). Read more about .

No (Required for )

Channels refer to the number of independent audio signals or paths in an audio file. Common values are mono (1 channel) and stereo (2 channels). Read more about .

Pulse
Pulse
Pulse
Cosmos
details
Pulse
Cosmos
details
details
details
details
details
details
details
Raw Audio Format
Raw Audio Format
Raw Audio Format
Raw Audio Format
Raw Audio Format
Raw Audio Format