Title: Jailbreaking with Classical Chinese: The Limitations of P...

URL Source: https://www.bestblogs.dev/status/2036994531628114080

Published Time: 2026-03-26 02:31:56

Markdown Content: Skip to main content ![Image 1: LogoBestBlogs](https://www.bestblogs.dev/ "BestBlogs.dev")Toggle navigation menu Toggle navigation menuArticles Podcasts Videos Tweets Sources Newsletters

⌘K

Change language Switch ThemeSign In

Narrow Mode

Jailbreaking with Classical Chinese: The Limitations of Pattern Matching in LLM Safety Guardrails

![Image 2: 李继刚](https://www.bestblogs.dev/en/tweets?sourceId=SOURCE_ca3b56) ### 李继刚

@lijigang

日读论文

--------- arxiv.org/abs/2602.22983

大模型学会了拒绝危险请求——用英文问它怎么造炸弹，它摇头。用现代中文问，也摇头。但如果用文言文问呢？

这就是这篇论文戳中的软肋：大模型的安全护栏几乎全是用现代语言训练出来的。文言文在训练数据里占比极小，语法结构和现代汉语天差地别，安全对齐机制对它基本失灵。更狠的是，文言文天生适合藏东西——言简意深、比喻密集、一个字能拐好几个弯。模型读得懂内容，却认不出危险意图。

之前的越狱攻击主要在现代英文里打转：角色扮演、逻辑诱导、梯度搜索。有人试过低资源语言，但没人系统地拿文言文开刀。这篇论文提出了 CC-BOS，一个自动生成文言文越狱提示词的框架，用果蝇觅食算法在八维策略空间里搜索最优攻击组合。结果：六个主流大模型，攻击成功率 100%。

---------

模型的安全护栏保护的不是"含义"，而是"表达含义的方式"。同一个危险意图，换一种表达风格，护栏就认不出来了。这说明当前的安全对齐本质上是在做模式匹配，不是在做意图理解。Show More

!Image 3: Tweet image

Mar 26, 2026, 2:31 AM View on X

3 Replies

6 Retweets

46 Likes

18.9K Views ![Image 4: 李继刚](https://www.bestblogs.dev/en/tweets?sourceid=ca3b56) 李继刚 @lijigang

One Sentence Summary

Bypassing LLM safety guardrails using Classical Chinese reveals that current safety alignment is essentially pattern matching, not true intent understanding.

Summary

Li Jigang analyzes a paper on LLM security, highlighting that safety guardrails are primarily trained on modern languages, creating a blind spot for low-resource languages like Classical Chinese. The paper introduces the CC-BOS framework, which leverages the linguistic features of Classical Chinese to successfully jailbreak mainstream models. This demonstrates that current safety alignment mechanisms are limited to 'pattern matching' rather than genuine 'intent understanding' when dealing with non-standard expressions.

AI Score

Influence Score 18

Published At Today

Language

Chinese

Jailbreaking with Classical Chinese: The Limitations of P...

文言文越狱：大模型安全护栏的模式匹配局限

Jailbreaking with Classical Chinese: The Limitations of Pattern Matching in LLM Safety Guardrails

Jailbreaking with Classical Chinese: The Limitations of Pattern Matching in LLM Safety Guardrails

One Sentence Summary

Summary

Tags

Jailbreaking with Classical Chinese: The Limitations of P...

🤖 問 AI