← 回總覽

Fish Audio S2:新一代高性能开源 TTS 模型发布

📅 2026-03-11 09:55 AIGCLINK 人工智能 3 分鐘 3596 字 評分: 84
TTS Fish Audio S2 语音合成 开源模型 人工智能
📌 一句话摘要 Fish Audio 发布 S2 版本 TTS 模型,具备极低延迟、多说话人支持及强大的语音控制能力。 📝 详细摘要 本推文详细介绍了 Fish Audio S2 这一新款语音合成(TTS)模型的核心特性。该模型在性能上表现卓越:实时率(RTF)达到 0.195,首包延迟仅为 100ms,且支持在单次生成中包含多个说话人。技术上,它支持超过 80 种语言,并能通过 15,000 多个自然语言描述标签进行精细的语音控制。结合被引用的推文信息,该模型已开源,非常适合实时对话、多角色故事叙事及长文本朗读等对实时性和表现力要求极高的场景。 📊 文章信息 AI 评分:84 来源:A
Skip to main content ![Image 1: LogoBestBlogs](https://www.bestblogs.dev/ "BestBlogs.dev")Toggle navigation menu Toggle navigation menuArticlesPodcastsVideosTweetsSourcesNewsletters

⌘K

Change language Switch ThemeSign In

Narrow Mode

Fish Audio S2: Next-Gen High-Performance Open-Source TTS Model Released =======================================================================

Fish Audio S2: Next-Gen High-Performance Open-Source TTS Model Released ======================================================================= ![Image 2: AIGCLINK](https://www.bestblogs.dev/en/tweets?sourceId=SOURCE_fa9efd59) ### AIGCLINK

@aigclink

Fish的TTS刚出了新款:Fish Audio S2,RTF 0.195,首包延迟100ms,单次生成可包含多个说话人

音质清晰和表达自然度在TTS里算是可以的

支持80+语言、长文本、15000+种自然语言描述标签进行语音控制

完全可以应对实时对话、多角色故事、长文本朗读的场景

#TTS #FishAudioS2

!Image 3: 视频缩略图

00:49

!Image 4: Fish Audio

#### Fish Audio

@FishAudio · 11h ago

Today we launch Fish Audio S2, a new generation of expressive TTS with absurdly controllable emotion.

  • open-source
  • sub 150ms latency
  • multi-speaker in one pass
Real freedom of speech starts now 👇Show More

!Image 5: 视频缩略图

00:49

89

158

1,237

1M

Mar 11, 2026, 1:55 AM View on X

1 Replies

2 Retweets

11 Likes

3,325 Views ![Image 6: AIGCLINK](https://www.bestblogs.dev/en/tweets?sourceid=fa9efd59) AIGCLINK @aigclink

One Sentence Summary

Fish Audio launches the S2 version of its TTS model, featuring ultra-low latency, multi-speaker support, and advanced voice control capabilities.

Summary

This tweet details the core features of Fish Audio S2, a new text-to-speech (TTS) model. The model delivers outstanding performance with a Real-Time Factor (RTF) of 0.195 and first-packet latency of only 100ms, while supporting multiple speakers in a single generation. Technically, it supports over 80 languages and allows for precise voice control via 15,000+ natural language description tags. As an open-source model, it is perfectly suited for scenarios requiring high real-time performance and expressiveness, such as live dialogue, multi-character storytelling, and long-form narration.

AI Score

84

Influence Score 5

Published At Today

Language

Chinese

Tags

TTS

Fish Audio S2

Speech Synthesis

Open Source Model

AI HomeArticlesPodcastsVideosTweets

Fish Audio S2: Next-Gen High-Performance Open-Source TTS ... ===============

查看原文 → 發佈: 2026-03-11 09:55:54 收錄: 2026-03-11 12:00:44

🤖 問 AI

針對這篇文章提問,AI 會根據文章內容回答。按 Ctrl+Enter 送出。