Raw Audio Format

A raw audio file is any file containing un-containerized and uncompressed audio. The data is stored as raw pulse-code modulation (PCM) values without any metadata header information (such as sampling rate, bit depth, endian, or number of channels)

Extensions

Raw files can have a wide range of file extensions, common ones being .raw, .pcm, or .sam. They can also have no extension.

Transcribing Raw Audio Format

If you want to transcribe raw audio format, you need to send the request with the following parameters. These parameters ensure that the audio is processed correctly by the Gowajee STT API.

Required Parameters

sampleRate (Integer): The sample rate represents the number of samples of audio carried per second, measured in Hertz (Hz). It defines how many data points of audio are sampled in one second.
- Example: 16000, 44100
sampleWidth (Integer): The sample width, also known as bit depth, determines the number of bits used to represent each audio sample. It directly affects the dynamic range of the audio signal.
- Example: 1 (8-bit), 2 (16-bit)
channels (Integer): Channels refer to the number of independent audio signals or paths in an audio file. Common values are mono (1 channel) and stereo (2 channels).
- Example: 1 (mono), 2 (stereo)

Example Request

POST /v1/speech-to-text/${MODEL}/transcribe HTTP/1.1
Host: api.gowajee.ai
Content-Type: application/json
X-Api-Key: ${YOUR_API_KEY}

{
  "audioData": "base64_encoded_raw_audio_data",
  "sampleRate": 16000,
  "sampleWidth": 2,
  "channels": 1
}

Example Response

{
  "type": "ASR_PULSE",
  "amount": 4.517,
  "output": {
    "results": [
      {
        "transcript": "วันนี้กินอะไรดี",
        "startTime": 0,
        "endTime": 4.517
      }
    ],
    "duration": 4.517,
    "version": "2.2.0"
  }
}

PreviousWords Boosting NextPricing

Last updated 1 year ago