⌘K
Change language Switch ThemeSign In
Narrow Mode
Cohere's open-weight ASR model hits 5.4% word error rate — low enough to replace speech APIs in production pipelines
!Image 2: VentureBeat VentureBeat @Emilia David
One Sentence Summary
Cohere has released 'Transcribe,' an open-weight, production-grade ASR model with a 5.42% word error rate, designed to offer enterprise-level accuracy and self-hosting capabilities that compete with closed APIs.
Summary
Cohere's new open-weight ASR model, Transcribe, addresses the enterprise dilemma of choosing between high-accuracy closed APIs and less-performant open models. With 2 billion parameters and an Apache-2.0 license, it achieves a state-of-the-art 5.42% word error rate (WER), outperforming competitors like OpenAI's Whisper Large v3 and ElevenLabs Scribe v2. The model is optimized for local deployment, enabling organizations to maintain data privacy and residency while integrating high-performance speech-to-text into RAG pipelines and agent workflows.
Main Points
* 1. Competitive performance in ASR benchmarks.Transcribe achieves a 5.42% WER, currently leading the Hugging Face ASR leaderboard and surpassing established models like Whisper Large v3 (7.44%) and ElevenLabs Scribe v2 (5.83%). * 2. Enterprise-ready open-weight architecture.Licensed under Apache-2.0, the model allows for local, self-hosted deployment, effectively eliminating the data residency and privacy risks associated with closed cloud APIs. * 3. Optimized for production pipelines.Designed for high throughput and accuracy, the model is suitable for direct integration into voice-powered automations, RAG pipelines, and agent workflows where latency and control are critical.
Metadata
AI Score
86
Website venturebeat.com
Published At Today
Length 482 words (about 2 min)
Sign in to use highlight and note-taking features for a better reading experience. Sign in now
Enterprises building voice-enabled workflows have had limited options for production-grade transcription: closed APIs with data residency risks, or open models that trade accuracy for deployability. Cohere's new open-weight ASR model, Transcribe, is built to compete on all four key differentiators — contextual accuracy, latency, control and cost.
Cohere says that Transcribe outperforms current leaders on accuracy — and unlike closed APIs, it can run on an organization's own infrastructure.
Cohere, which can be accessed via an API or in Cohere’s Model Vault as cohere-transcribe-03-2026, has 2 billion parameters and is licensed under Apache-2.0. The company said Transcribe has an average word error rate (WER) of just 5.42%, so it makes fewer mistakes than similar models.
It’s trained on 14 languages: English, French, German, Italian, Spanish, Greek, Dutch, Polish, Portuguese, Chinese, Japanese, Korean, Vietnamese and Arabic. The company did not specify which Chinese dialect the model was trained on.
Cohere said it trained the model “with a deliberate focus on minimizing WER, while keeping production readiness top-of-mind.” According to Cohere, the result is a model that enterprises can plug directly into voice-powered automations, transcription pipelines, and audio search workflows.
Self-hosted transcription for production pipelines
Until recently, enterprise transcription has been a trade-off — closed APIs offered accuracy but locked in data; open models offered control but lagged on performance. Unlike Whisper, which launched as a research model under MIT license, Transcribe is available for commercial use from release and can run on an organization's own local GPU infrastructure. Early users flagged the commercial-ready open-weight approach as meaningful for enterprise deployments.
Organizations can bring Transcribe to their own local instances, since Cohere said the model has a more manageable inference footprint for local GPUs. The company said they were able to do this because the model “extends the Pareto frontier, delivering state-of-the-art accuracy (low WER) while sustaining best-in-class throughput (high RTFx) within the 1B+ parameter model cohort.”
How Transcribe stacks up
Transcribe outperformed speech-model stalwarts, including Whisper from OpenAI, which powers the voice feature of ChatGPT, and ElevenLabs, which many big retail brands deploy. It currently tops the Hugging Face ASR leaderboard, leading with an average word error rate of 5.42%, outperforming Whisper Large v3 at 7.44%, ElevenLabs Scribe v2 at 5.83%, and Qwen3-ASR-1.7B at 5.76%.
Based on other datasets tested by Hugging Face, Transcribe also performed well. The AMI dataset, which measures meeting understanding and dialogue analysis, Transcribe logged a score of 8.15%. For the Voxpopuli dataset that tests understanding of different accents, the model scored 5.87%, beaten only by Zoom Scribe.
Early users have flagged accuracy and local deployment as the standout factors — particularly for teams that have been routing audio data through external APIs and want to bring that workload in-house.
For engineering teams building RAG pipelines or agent workflows with audio inputs, Transcribe offers a path to production-grade transcription without the data residency and latency penalties of closed APIs.
!Image 3: VentureBeat VentureBeat @Emilia David
One Sentence Summary
Cohere has released 'Transcribe,' an open-weight, production-grade ASR model with a 5.42% word error rate, designed to offer enterprise-level accuracy and self-hosting capabilities that compete with closed APIs.
Summary
Cohere's new open-weight ASR model, Transcribe, addresses the enterprise dilemma of choosing between high-accuracy closed APIs and less-performant open models. With 2 billion parameters and an Apache-2.0 license, it achieves a state-of-the-art 5.42% word error rate (WER), outperforming competitors like OpenAI's Whisper Large v3 and ElevenLabs Scribe v2. The model is optimized for local deployment, enabling organizations to maintain data privacy and residency while integrating high-performance speech-to-text into RAG pipelines and agent workflows.
Main Points
* 1. Competitive performance in ASR benchmarks.
Transcribe achieves a 5.42% WER, currently leading the Hugging Face ASR leaderboard and surpassing established models like Whisper Large v3 (7.44%) and ElevenLabs Scribe v2 (5.83%).
* 2. Enterprise-ready open-weight architecture.
Licensed under Apache-2.0, the model allows for local, self-hosted deployment, effectively eliminating the data residency and privacy risks associated with closed cloud APIs.
* 3. Optimized for production pipelines.
Designed for high throughput and accuracy, the model is suitable for direct integration into voice-powered automations, RAG pipelines, and agent workflows where latency and control are critical.
Key Quotes
* Transcribe is built to compete on all four key differentiators --- contextual accuracy, latency, control and cost. * Unlike Whisper, which launched as a research model under MIT license, Transcribe is available for commercial use from release and can run on an organization's own local GPU infrastructure. * For engineering teams building RAG pipelines or agent workflows with audio inputs, Transcribe offers a path to production-grade transcription without the data residency and latency penalties of closed APIs.
AI Score
86
Website venturebeat.com
Published At Today
Length 482 words (about 2 min)
Tags
ASR
Speech Recognition
Cohere
Open Weights
Enterprise AI
Related Articles
* New MiniMax M2.7 proprietary AI model is 'self-evolving' and can perform 30-50% of reinforcement learning research workflow * What is DeerFlow 2.0 and what should enterprises know about this new, powerful local AI agent orchestrator? * Google's new TurboQuant algorithm speeds up AI memory 8x, cutting costs by 50% or more * Anthropic just shipped an OpenClaw killer called Claude Code Channels, letting you message it over Telegram and Discord, effectively challenging open-source agent frameworks like OpenClaw.") * Cursor’s new coding model Composer 2 is here: It beats Claude Opus 4.6 but still trails GPT-5.4 * Zapier VP of Product on Orchestrating 800+ AI Agents to Manage Everything * Introducing LangSmith Fleet * OpenAI Launches Frontier: A Platform for AI Agent Management * Why NOW is the Golden Era to build AI apps. * [[AINews] Z.ai GLM-5: New SOTA Open Weights LLM](https://www.bestblogs.dev/en/article/5ee7f51e "Z.ai releases GLM-5, a 744B parameter open-weight model featuring DeepSeek Sparse Attention, marking a peak in the 'China open model week' alongside major updates from DeepSeek and MiniMax.") HomeArticlesPodcastsVideosTweets