Start now for free
CHANGELOG

Product improvements

Check out the AssemblyAI changelog to see weekly accuracy and product improvements our team has been working on.

Powering incredible companies

1

New parameter and timestamps fix

We’ve introduced a new, optional speech_threshold parameter, allowing users to only transcribe files that contain at least a specified percentage of spoken audio, represented as a ratio in the range [0, 1].

You can use the speech_threshold parameter with our Python SDK as below:

import assemblyai as aai

aai.settings.api_key = f"{ASSEMBLYAI_API_KEY}"

config = aai.TranscriptionConfig(speech_threshold=0.1)

file_url = "https://github.com/AssemblyAI-Examples/audio-examples/raw/main/20230607_me_canadian_wildfires.mp3"

transcriber = aai.Transcriber()
transcript = transcriber.transcribe(file_url, config)

print(transcript.text)
Smoke from hundreds of wildfires in Canada is triggering air quality alerts throughout the US. Skylines from ...

If the percentage of speech in the audio file does not meet or surpass the provided threshold, then the value of transcript.text will be None and you will receive an error:

if not transcript.text:
	print(transcript.error)
Audio speech threshold 0.9461 is below the requested speech threshold value 1.0

As usual, you can also include the speech_threshold parameter in the JSON of raw HTTP requests for any language.

We’ve fixed a bug in which timestamps could sometimes be incorrectly reported for our Topic Detection and Content Safety models.

We’ve made improvements to detect and remove a hallucination that would sometimes occur with specific audio patterns.

1

Character sequence improvements

We’ve fixed an issue in which the last character in an alphanumeric sequence could fail to be transcribed. The fix is effective immediately and constitutes a 95% reduction in errors of this type.

We’ve fixed an issue in which consecutive identical numbers in a long number sequence could fail to be transcribed. This fix is effective immediately and constitutes a 66% reduction in errors of this type.

1

Speaker Labels improvement

We’ve made improvements to the Speaker Labels model, adjusting the impact of the speakers_expected parameter to better allow the model to determine the correct number of unique speakers, especially in cases where one or more speakers talks substantially less than others.

We’ve expanded our caching system to include additional third-party resources to help further ensure our continued operations in the event of external resources being down.

1

Significant processing time improvement

We’ve made significant improvements to our transcoding pipeline, resulting in a 98% overall speedup in transcoding time and a 12% overall improvement in processing time for our asynchronous API.

We’ve implemented a caching system for some third-party resources to ensure our continued operations in the event of external resources being down.

1

Announcing LeMUR - our new framework for applying powerful LLMs to transcribed speech

We’re introducing our new framework LeMUR, which makes it simple to apply Large Language Models (LLMs) to transcripts of audio files up to 10 hours in length.

LLMs unlock a range of impressive capabilities that allow teams to build powerful Generative AI features. However, building these features is difficult due to the limited context windows of modern LLMs, among other challenges that necessitate the development of complicated processing pipelines.

LeMUR circumvents this problem by making it easy to apply LLMs to transcribed speech, meaning that product teams can focus on building differentiating Generative AI features rather than focusing on building infrastructure. Learn more about what LeMUR can do and how it works in our announcement blog, or jump straight to trying LeMUR in our Playground.