Fast, accurate, multilingual speech recognition and speech summarization software
Get human-quality transcriptions with state-of-the-art speech recognition technology
Auto-summarize your recordings and highlight key moments in discussion
Integrate speech recognition technology into your apps in minutes
We support all major languages for use in speech-to-text transcription
Speech-to-text API supports almost all formats of audio and video files
Accurate and multi-language speech recognition API at only 1.2¢ per minute
SpeechText.AI | Watson Speech to Text | Google Speech API | Rev.AI | |
---|---|---|---|---|
Price per Hour | $0.7-$1.0 | $1.2 | $1.44-$2.16 | $1.4 |
Languages Supported | Multiple | Multiple | Multiple | Multiple |
Punctuation/Casing | ||||
Keyword Highlights | ||||
Audio/Video Summarization | ||||
Integration Time | Up to 1 hour | 1-2 days | 1-2 days | 2-4 hours |
All File Formats Accepted | ||||
Process Data From | Anywhere | Binary Data | Cloud Storage Bucket | Anywhere |
Export as SRT/VTT | ||||
Free Technical Support |
Build accurate speech recognition applications in minutes. We take care of the complexity behind and wrap it in a few lines of code.
import requests
import json
secret_key = "SECRET_KEY"
# loads the audio into memory
with open("/path/to/your/file.mp3", mode="rb") as file:
post_body = file.read()
API_URL = "https://api.speechtext.ai/recognize?"
header = {'Content-Type': "application/octet-stream"}
options = {
"key" : secret_key,
"language" : "en-US",
"punctuation" : True,
"format" : "mp3"
}
# send an audio file to SpeechText.AI
r = requests.post(API_URL, headers = header, params = options, data = post_body)
# create transcription task
curl -H "Content-Type:application/octet-stream" --data-binary @/path/to/your/file.m4a "https://api.speechtext.ai/recognize?key=SECRET_KEY&language=en-US&punctuation=true&format=m4a"
# retrieve transcription results
curl -X GET "https://api.speechtext.ai/results?key=SECRET_KEY&task=TASK_ID&summary=true&summary_size=15&highlights=true&max_keywords=10"
# get captions
curl -X GET "https://api.speechtext.ai/results?key=SECRET_KEY&task=TASK_ID&output=srt&max_caption_words=10"
# process public URL
curl -X GET "https://api.speechtext.ai/recognize?key=SECRET_KEY&url=PUBLIC_URL&language=en-US&punctuation=true&format=mp3"
<?php
$secret_key = "SECRET_KEY";
# loads the audio
$filesize = filesize('/path/to/your/file.m4a');
$fp = fopen('/path/to/your/file.m4a', 'rb');
// read the entire file into a binary string
$binary = fread($fp, $filesize);
# endpoint and options to start a transcription task
$endpoint = "https://api.speechtext.ai/recognize?key=".$secret_key."&language=en-US&punctuation=true&format=m4a";
$header = array('Content-type: application/octet-stream');
# curl connection initialization
$ch = curl_init();
# curl options
curl_setopt_array($ch, array(
CURLOPT_URL => $endpoint,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_POST => true,
CURLOPT_HEADER => false,
CURLOPT_HTTPHEADER => $header,
CURLOPT_POSTFIELDS => $binary,
CURLOPT_FOLLOWLOCATION => true
));
# send an audio transcription request
$body = curl_exec($ch);
curl_close($ch);
import java.net.*;
import java.io.*;
import java.util.concurrent.TimeUnit;
import org.json.*;
public class Transcriber {
public static void main(String[] args) throws Exception {
String secret_key = "SECRET_KEY";
HttpURLConnection conn;
// endpoint and options to start a transcription task
URL endpoint = new URL("https://api.speechtext.ai/recognize?key=" + secret_key +"&language=en-US&punctuation=true&format=m4a");
// loads the audio into memory
File file = new File("/path/to/your/file.m4a");
RandomAccessFile f = new RandomAccessFile(file, "r");
long sz = f.length();
byte[] post_body = new byte[(int) sz];
f.readFully(post_body);
f.close();
// send an audio transcription request
conn = (HttpURLConnection) endpoint.openConnection();
conn.setRequestMethod("POST");
conn.setRequestProperty("Content-Type", "application/octet-stream");
conn.setDoOutput(true);
conn.connect();
OutputStream os = conn.getOutputStream();
os.write(post_body);
os.flush();
os.close();
}
}
Monthly subscription packages to suit any budget and use case
We guarantee SpeechText.AI subscribers that all files are deleted immediately after the transcription has been finished and that the connection to our servers is always encrypted. This means that your audio/video files are not used for any purposes other than automatic speech recognition, nor can they be accessed by third parties. All our physical servers are located in Europe, and all our operations comply with European Union Data Protection laws.
Order receipt email messages sent to customers include a link to the customer's Account Management site. The Account Management site includes separate tabs for Subscriptions, Account Details, and Payment Methods. The Subscriptions tab lists all active and inactive subscriptions, and the Manage command for each subscription lets you update the payment method or cancel the subscription. If you cancel a subscription, you can also uncancel it here, up until the deactivation date.
Yes. The speech recognition service supports GET HTTP requests and can transcribe audio/video data from public URLs (e.g. Google Drive, Dropbox). You can execute GET requests by using curl in your terminal window or even call the API directly from your web browser.
Our speech to text service currently supports English, German, French, Spanish, Dutch, Italian, Portuguese, Russian, Chinese, Japanese, Korean, Arabic, Hindi, Polish, Swedish, Norwegian, Danish, Finnish, Turkish, Romanian, Czech, Ukrainian, Greek, Thai, Indonesian, Vietnamese, Filipino, English (Global model, multiple accents), English (India), German (Austria), Portuguese (Brazil), Spanish (Mexico), French (Canada).
The audio summarization feature automatically extracts the most important ideas from audio or video content and generates an accurate summary of the transcription text. Also, the speech recognition service can automatically detect and highlight key phrases in transcription results.
The speech to text API is powered by deep learning technologies to assist you in transcribing speech accurately and fast. Our state-of-the-art speech recognition algorithm achieves an average word error rate of 3.8% on different open datasets (~1000 hours of speech). But there are many factors that can affect the recognition accuracy, including but not limited to audio quality, background noise, multiple speakers talking at the same time, etc.