Today, we're happy to announce the release of our most accurate Speech Recognition model to date—version 8 (v8)
This new model dramatically improves overall accuracy (up to 19% relative), and proper noun accuracy as well (up to 25% relative)
You can read more about our v8 model in our blog here. This new model dramatically improves accuracy around proper noun recognition and accent recognition without the need to specify an acoustic or language model.
We fixed an edge case where a small percentage of short (<60 seconds in length) dual-channel audio files, with the same audio on each channel, were not collapsed to mono files, resulting in repeated words.
This week we have released our Topic Detection v3 model.
This model improves on Topic Detection v2's ability to detect topics based on context. In the following text segment, the model was able to predict "Rugby" without the mention of the sport directly.
Instead of relying on the word "Rugby," the model was able to identify "Ed Robinson" as a Rugby coach and "six nations" as a Rugby tournament and correctly identify it as a conversation about Rugby.
Fixes and Improvements
We also released a fix for our PII Redaction feature that corrects an issue where the model would sometimes over-redact phone numbers as credit card information or social security numbers.
Our model will now better identify phone numbers in cases where they are not explicitly referred to as a phone number—allowing them to be correctly redacted or unredacted based on the policies submitted with the POST request.
This week, our engineering team has been focused on our v8 transcription model, which will introduce a major accuracy improvement across all audio types. Stay tuned! In the meantime, we shipped a few bug fixes around our real-time transcription API and the /v2/stream API.
Fixed an edge case where higher sample rates would occasionally trigger a "Client sent audio too fast" error from the real-time streaming API.
Fixed an edge case where some streams from real-time were held open after a customer idled their session.
Fixed an edge case in the /v2/stream endpoint, where large periods of silence would occasionally cause automatic punctuation to fail.
Improved error handling for when a customer sends non-json input allowing us to communicate these occurrences more effectively.