← 回總覽

PaddleOCR Star 数超越 Google Tesseract 成为全球第一

📅 2026-03-30 17:13 Berryxia.AI 人工智能 3 分鐘 3562 字 評分: 82
PaddleOCR OCR 开源 百度 Data-Centric
📌 一句话摘要 PaddleOCR 在 GitHub 获得 73.3k+ Star,超越 Google Tesseract 成为全球最受欢迎的 OCR 项目,并总结了其技术创新点。 📝 详细摘要 PaddleOCR 达成重要开源里程碑,Star 数超越 Google Tesseract。推文详细梳理了其技术优势:PP-OCRv5 采用 Data-Centric 系统化优化策略,在极小参数下实现高性能;引入难度甜点区训练法;支持 Markdown/JSON 结构化输出,适配 RAG 场景。该项目已成为 MinerU、RAGFlow 等主流工具的核心引擎。 📊 文章信息 AI 评分:82 来

Title: PaddleOCR Surpasses Google Tesseract to Become #1 Globall...

URL Source: https://www.bestblogs.dev/status/2038545095809732962

Published Time: 2026-03-30 09:13:19

Markdown Content: Skip to main content ![Image 1: LogoBestBlogs](https://www.bestblogs.dev/ "BestBlogs.dev")Toggle navigation menu Toggle navigation menuArticlesPodcastsVideosTweetsSourcesNewsletters

⌘K

Change language Switch ThemeSign In

Narrow Mode

PaddleOCR Surpasses Google Tesseract to Become #1 Globally

PaddleOCR Surpasses Google Tesseract to Become #1 Globally

![Image 2: Berryxia.AI](https://www.bestblogs.dev/en/tweets?sourceId=SOURCE_4287449f) ### Berryxia.AI

@berryxia

🔥 兄弟们!OCR GitHub 榜又有新王!这次不是 Google……

历史性时刻来了👇

PaddleOCR 以 73.3k+ Star 超越 Google Tesseract,成为全球 Star 数最高的 OCR 项目!

而他就是来自百度Team!

🎯 为什么能超越 Google?

技术亮点:

• PP-OCRv5 是一款仅 5M 参数的超轻量级 OCR 系统,其核心亮点在于通过数据为中心(Data-Centric)的系统化优化策略

OCR 性能上媲美甚至超越 GPT-4o 等千亿参数的视觉语言大模型(VLMs)

• 核心创新:“难度甜点区”训练法——就像教小孩做题,找到模型学习的最佳难度区间

• Data-Centric 优化:从难度、准确性、多样性三维度重构数据策略

• 成本降低 99%,推理速度快 50 倍

实用价值:

• 直接输出 Markdown/JSON 结构化数据,完美对接 LLM/RAG

• 支持 100+语言,覆盖全球 95%+人口

• MinerU、RAGFlow 等顶级项目核心引擎

这是中国开源首次在 OCR 领域超越 Google 标杆。

73.3k+ Star,全球开发者用脚投票,实力说话!

👁项目地址见评论区👇🏻Show More

!Image 3: 媒体 1

!Image 4: 媒体 2

!Image 5: 媒体 3

!Image 6: 媒体 4

Mar 30, 2026, 9:13 AM View on X

12 Replies

10 Retweets

54 Likes

5,317 Views ![Image 7: Berryxia.AI](https://www.bestblogs.dev/en/tweets?sourceid=4287449f) Berryxia.AI @berryxia

One Sentence Summary

PaddleOCR has reached 73.3k+ stars on GitHub, surpassing Google Tesseract as the world's most popular OCR project, driven by its technical innovations.

Summary

PaddleOCR has achieved a significant open-source milestone by surpassing Google Tesseract in stars. The tweet details its technical advantages: PP-OCRv5 utilizes a Data-Centric systematic optimization strategy to achieve high performance with minimal parameters; introduces the 'Difficulty Sweet Spot' training method; and supports structured Markdown/JSON output, making it ideal for RAG scenarios. It has become the core engine for mainstream tools like MinerU and RAGFlow.

AI Score

82

Influence Score 34

Published At Today

Language

Chinese

Tags

PaddleOCR

OCR

Open Source

Baidu

Data-Centric HomeArticlesPodcastsVideosTweets

PaddleOCR Surpasses Google Tesseract to Become #1 Globall...

查看原文 → 發佈: 2026-03-30 17:13:19 收錄: 2026-03-30 20:00:15

🤖 問 AI

針對這篇文章提問,AI 會根據文章內容回答。按 Ctrl+Enter 送出。