Models
Last updated
Last updated
Gowajee.ai offers three distinct models for speech-to-text conversion, each optimized for different use cases.
Below is an overview of our models.
Pulse
~0.17
Pulse strikes the best balance between accuracy and speed, making it ideal for applications where both performance and responsiveness are crucial.
Cosmos
~0.22
Cosmos is the most accurate model, and the speed is still acceptable.
✅ Recommended for most use cases.
Astro* (Deprecated)
~0.23**
Astro is a GPU-based model that is optimized for accuracy, providing a high level of precision for critical applications where accuracy is paramount. Update Oct 2024: Now, Cosmos has more accuracy than the Astro model.
Note:
(*) The Astro model has low availability as it requires GPU resources to compute. If you anticipate high demand and need to use the Astro model extensively, please to discuss your requirements and ensure adequate resource allocation.
(**) The Astro model is running on serverless GPU servers, The RTF of the Astro model does not include the cold-start time.
RTF (Real-Time Factor): This metric indicates the time taken by the model to process one second of audio (Lower is better). An RTF of 0.17 means the model takes 0.17 seconds to process one second of audio, highlighting the model's processing speed.
This is how to map model names to their corresponding API paths. Each speech-to-text (STT) model in Gowajee has a specific API path for transcription requests. Use the correct path based on the model you intend to use.
Pulse
pulse
Cosmos
cosmos
Astro
astro
Use the following API endpoints for transcription requests based on the chosen model:
Pulse Model:
Endpoint: /v1/speech-to-text/pulse/transcribe
Cosmos Model:
Endpoint: /v1/speech-to-text/cosmos/transcribe
Astro Model:
Endpoint: /v1/speech-to-text/astro/transcribe/async