Start now for free
CHANGELOG

Product improvements

Check out the AssemblyAI changelog to see weekly accuracy and product improvements our team has been working on.

Powering incredible companies

1

New Streaming STT features

We’ve added a new message type to our Streaming Speech-to-Text (STT) service. This new message type SessionInformation is sent immediately before the final SessionTerminated message when closing a Streaming session, and it contains a field called audio_duration_seconds which contains the total audio duration processed during the session. This feature allows customers to run end-user-specific billing calculations.

To enable this feature, set the enable_extra_session_information query parameter to true when connecting to a Streaming WebSocket.

endpoint_str = 'wss://api.assemblyai.com
/v2/realtime/ws?sample_rate=8000&enable_extra_session_information=true'

This feature will be rolled out in our SDKs soon.

We’ve added a new feature to our Streaming STT service, allowing users to disable Partial Transcripts in a Streaming session. Our Streaming API sends two types of transcripts - Partial Transcripts (unformatted and unpunctuated) that gradually build up the current utterance, and Final Transcripts which are sent when an utterance is complete, containing the entire utterance punctuated and formatted.

Users can now set the disable_partial_transcripts query parameter to true when connecting to a Streaming WebSocket to disable the sending of Partial Transcript messages.

endpoint_str = 'wss://api.assemblyai.com
/v2/realtime/ws?sample_rate=8000&disable_partial_transcripts=true'

This feature will be rolled out in our SDKs soon.

We have fixed a bug in our async transcription service, eliminating File does not appear to contain audio errors. Previously, this error would be surfaced in edge cases where our transcoding pipeline would not have enough resources to transcode a given file, thus failing due to resource starvation.

1

Dual channel transcription improvements

We’ve made improvements to how utterances are handled during dual-channel transcription. In particular, the transcription service now has elevated sensitivity when detecting utterances, leading to improved utterance insertions when there is overlapping speech on the two channels.

1

LeMUR concurrency fix

We’ve fixed a temporary issue in which users with low account balances would occasionally be rate-limited to a value less than 30 when using LeMUR.

1

Fewer "File does not appear to contain audio" errors

We’ve fixed an edge-case bug in our async API, leading to a significant reduction in errors that say File does not appear to contain audio. Users can expect to see an immediate reduction in this type of error. If this error does occur, users should retry their requests given that retries are generally successful.

We’ve made improvements to our transcription service autoscaling, leading to improved turnaround times for requests that use Word Boost when there is a spike in requests to our API.

1

New developer controls for real-time end-of-utterance

We have released developer controls for real-time end-of-utterance detection, providing developers control over when an utterance is considered complete. Developers can now either manually force the end of an utterance, or set a threshold for time of silence before an utterance is considered complete. 

We have made changes to our English async transcription service that improve sentence segmentation for our Sentiment Analysis, Topic Detection, and Content Moderation models. The improvements fix a bug in which these models would sometimes delineate sentences on titles that end in periods like Dr. and Mrs.

We have fixed an issue in which transcriptions of very long files (8h+) with disfluencies enabled would error out.