Start now for free
CHANGELOG

Product improvements

Check out the AssemblyAI changelog to see weekly accuracy and product improvements our team has been working on.

Powering incredible companies

1

Pricing decreases

We have decreased the price of Core Transcription from $0.90 per hour to $0.65 per hour, and decreased the price of Real-Time Transcription from $0.90 per hour to $0.75 per hour.

Both decreases were effective as of August 3rd.

1

Significant Summarization model speedups

We’ve implemented changes that yield between a 43% to 200% increase in processing speed for our Summarization models, depending on which model is selected, with no measurable impact on the quality of results.

We have standardized the response from our API for automatically detected languages that do not support requested features. In particular, when Automatic Language Detection is used and the detected language does not support a feature requested in the transcript request, our API will return null in the response for that feature.

1

Introducing LeMUR, the easiest way to build LLM apps on spoken data

We've released LeMUR - our framework for applying LLMs to spoken data - for general availability. LeMUR is optimized for high accuracy on specific tasks:

  1. Custom Summary allows users to automatically summarize files in a flexible way
  2. Question & Answer allows users to ask specific questions about audio files and receive answers to these questions
  3. Action Items allows users to automatically generate a list of action items from virtual or in-person meetings

Additionally, LeMUR can be applied to groups of transcripts in order to simultaneously analyze a set of files at once, allowing users to, for example, summarize many podcast episode or ask questions about a series of customer calls.

Our Python SDK allows users to work with LeMUR in just a few lines of code:

# version 0.15 or greater
import assemblyai as aai

# set your API key
aai.settings.api_key = f"{API_TOKEN}"

# transcribe the audio file (meeting recording)
transcriber = aai.Transcriber()
transcript = transcriber.transcribe("https://storage.googleapis.com/aai-web-samples/meeting.mp4")

# generate and print action items
result = transcript.lemur.action_items(
    context="A GitLab meeting to discuss logistics",
    answer_format="**<topic header>**\n<relevant action items>\n",
)

print(result.response)

Learn more about LeMUR in our blog post, or jump straight into the code in our associated Colab notebook.

1

Introducing our Conformer-2 model

We've released Conformer-2, our latest AI model for automatic speech recognition. Conformer-2 is trained on 1.1M hours of English audio data, extending Conformer-1 to provide improvements on proper nouns, alphanumerics, and robustness to noise.

Conformer-2 is now the default model for all English audio files sent to the v2/transcript endpoint for async processing and introduces no breaking changes.

We’ll be releasing Conformer-2 for real-time English transcriptions within the next few weeks.

Read our full blog post about Conformer-2 here. You can also try it out in our Playground.

1

New parameter and timestamps fix

We’ve introduced a new, optional speech_threshold parameter, allowing users to only transcribe files that contain at least a specified percentage of spoken audio, represented as a ratio in the range [0, 1].

You can use the speech_threshold parameter with our Python SDK as below:

import assemblyai as aai

aai.settings.api_key = f"{ASSEMBLYAI_API_KEY}"

config = aai.TranscriptionConfig(speech_threshold=0.1)

file_url = "https://github.com/AssemblyAI-Examples/audio-examples/raw/main/20230607_me_canadian_wildfires.mp3"

transcriber = aai.Transcriber()
transcript = transcriber.transcribe(file_url, config)

print(transcript.text)
Smoke from hundreds of wildfires in Canada is triggering air quality alerts throughout the US. Skylines from ...

If the percentage of speech in the audio file does not meet or surpass the provided threshold, then the value of transcript.text will be None and you will receive an error:

if not transcript.text:
	print(transcript.error)
Audio speech threshold 0.9461 is below the requested speech threshold value 1.0

As usual, you can also include the speech_threshold parameter in the JSON of raw HTTP requests for any language.

We’ve fixed a bug in which timestamps could sometimes be incorrectly reported for our Topic Detection and Content Safety models.

We’ve made improvements to detect and remove a hallucination that would sometimes occur with specific audio patterns.