⌘K
Change language Switch ThemeSign In
Narrow Mode
pdf2epub-paddle: A PaddleOCR-based Tool for Converting Scanned PDFs to EPUB ===========================================================================
pdf2epub-paddle: A PaddleOCR-based Tool for Converting Scanned PDFs to EPUB ===========================================================================  ### GitHubDaily
@GitHub_Daily
扫描版 PDF 转成电子书,用普通 OCR 工具识别后,段落乱、排版差,章节还得手动整理,费时费力。
偶然发现 pdf2epub-paddle 这个实用工具,利用 OCR 技术将扫描版 PDF 转换成高质量的 EPUB 电子书。
基于百度 PaddleOCR ,不仅能智能识别段落、图片和表格,还能自动剔除干扰阅读的页眉页脚。
GitHub:github.com/jarodise/pdf2e…
支持智能分章和目录生成,转换过程中具备断点续传功能,无需担心网络波动导致前功尽弃。
部署非常简单,推荐使用 uv 进行管理,只需申请一个 API Token 即可开始转换,让扫描版书籍也能拥有原生电子书的阅读体验。Show More
Mar 11, 2026, 4:00 AM View on X
2 Replies
20 Retweets
65 Likes
5,530 Views  GitHubDaily @GitHub_Daily
One Sentence Summary
pdf2epub-paddle is an open-source tool that leverages Baidu's PaddleOCR technology to convert scanned PDFs into high-quality EPUB e-books.
Summary
This tweet recommends pdf2epub-paddle, a practical tool designed to solve common pain points when converting scanned PDFs to e-books, such as messy layouts and manual chapter organization. Built on Baidu's PaddleOCR, the tool intelligently recognizes paragraphs, images, and tables while automatically stripping away headers and footers. Key features include smart chapter splitting, TOC generation, and resumable conversion. It is easy to deploy, supports management via uv, and delivers a native e-book reading experience for scanned documents.
AI Score
82
Influence Score 37
Published At Today
Language
Chinese
Tags
PDF Conversion
EPUB
PaddleOCR
Open Source Tool
OCR Technology HomeArticlesPodcastsVideosTweets
pdf2epub-paddle: A PaddleOCR-based Tool for Converting Sc... ===============