Chat with us , powered by{' '} LiveChat
Join AssemblyAI

AssemblyAI Changelog

Check out the AssemblyAI changelog to see weekly accuracy and product improvements our team has been working on.

Powering incredible companies

1

New Language Code Parameter for English Spelling

  • Added a new language_code parameter when making requests to /v2/transcript.
  • Developers can set this to en_us, en_uk, and en_au, which will ensure the correct English spelling is used - British English, Australian English, or US English (Default).
  • Quick note: for customers that were historically using the assemblyai_en_au or assemblyai_en_uk acoustic models, the language_code parameter is essentially redundant and doesn't need to be used.

  • Fixed an edge-case where some files with prolonged silences would occasionally have a single word predicted, such as "you" or "hi."

1

New Features Coming Soon, Bug Fixes

  • This week, our engineering team has been hard at work preparing for the release of exciting new features like:
  • Chapter Detection: Automatically summarize audio and video files into segments (aka "chapters").
  • Sentiment Analysis: Determine the sentiment of sentences in your transcript as "positive", "negative", or "neutral".
  • Disfluencies: Detects filler-words like "um" and "uh".

  • Improved average real-time latency by 2.1% and p99 latency by 0.06%.

  • Fixed an edge-case where confidence scores in the utterances category for dual-channel audio files would occasionally receive a confidence score greater than 1.0.

1

Improved v8 Model Processing Speed

  • Improved the API's ability to handle audio/video files with a duration over 8 hours.

  • Further improved transcription processing times by 12%.
  • Fixed an edge case in our responses for dual channel audio files where if speaker 2 interrupted speaker 1,  the text from speaker 2 would cause the text from speaker 1 to be split into multiple turns, rather than contextually keeping all of speaker 1's text together.

1

v8 Transcription Model Released

  • Today, we're happy to announce the release of our most accurate Speech Recognition model for asynchronous transcription to date—version 8 (v8).
  • This new model dramatically improves overall accuracy (up to 19% relative), and proper noun accuracy as well (up to 25% relative).
  • You can read more about our v8 model in our blog here.

  • Fixed an edge case where a small percentage of short (<60 seconds in length) dual-channel audio files, with the same audio on each channel, resulted in repeated words in the transcription.

1

v2 Real-Time and v4 Topic Detection Models Released

  • Launched our v2 Real-Time Streaming Transcription model (read more on our blog).
  • This new model improves accuracy of our Real-Time Streaming Transcription by ~10%.
  • Launched our Topic Detection v4 model, with an accuracy boost of ~8.37% over v3 (read more on our blog).