Gowajee API
  • Introduction
  • Getting Started
  • Speech to text
    • Models
    • Limitations
      • Rate Limit
      • Astro Model Limitation
      • Word Timestamp
      • Words Boosting
      • Audio Duration Limit
      • Handling Large Files
      • HTTP Multipart/Form-Data
      • Supported File Formats
    • Transcription
      • Synchronous API (Request-Response)
      • Asynchronous API (Webhook Notification)
        • Checking Job Status
    • Real-time Transcription
    • Speaker Separation
      • Multichannel
      • Diarization
    • Words Boosting
    • Raw Audio Format
  • Pricing
Powered by GitBook
On this page
  • Extensions
  • Transcribing Raw Audio Format
  • Required Parameters
  • Example Request
  • Example Response
  1. Speech to text

Raw Audio Format

A raw audio file is any file containing un-containerized and uncompressed audio. The data is stored as raw pulse-code modulation (PCM) values without any metadata header information (such as sampling rate, bit depth, endian, or number of channels)

Extensions

Raw files can have a wide range of file extensions, common ones being .raw, .pcm, or .sam. They can also have no extension.


Transcribing Raw Audio Format

If you want to transcribe raw audio format, you need to send the request with the following parameters. These parameters ensure that the audio is processed correctly by the Gowajee STT API.

Required Parameters

  1. sampleRate (Integer): The sample rate represents the number of samples of audio carried per second, measured in Hertz (Hz). It defines how many data points of audio are sampled in one second.

    • Example: 16000, 44100

  2. sampleWidth (Integer): The sample width, also known as bit depth, determines the number of bits used to represent each audio sample. It directly affects the dynamic range of the audio signal.

    • Example: 1 (8-bit), 2 (16-bit)

  3. channels (Integer): Channels refer to the number of independent audio signals or paths in an audio file. Common values are mono (1 channel) and stereo (2 channels).

    • Example: 1 (mono), 2 (stereo)

Example Request

POST /v1/speech-to-text/${MODEL}/transcribe HTTP/1.1
Host: api.gowajee.ai
Content-Type: application/json
X-Api-Key: ${YOUR_API_KEY}

{
  "audioData": "base64_encoded_raw_audio_data",
  "sampleRate": 16000,
  "sampleWidth": 2,
  "channels": 1
}

Example Response

{
  "type": "ASR_PULSE",
  "amount": 4.517,
  "output": {
    "results": [
      {
        "transcript": "วันนี้กินอะไรดี",
        "startTime": 0,
        "endTime": 4.517
      }
    ],
    "duration": 4.517,
    "version": "2.2.0"
  }
}
PreviousWords BoostingNextPricing

Last updated 11 months ago