OCRFlux: A lightweight tool for efficiently converting complex PDFs into structured Markdown.
OCRFlux is a lightweight parsing tool based on a multimodal large language model, designed to convert PDF or image text into structured Markdown format with high quality. While preserving the original document structure, it accurately handles multi-column layouts, complex tables, and mathematical formulas, automatically removes header and footer interference, and achieves seamless merging of content across pages.
Core technological advantages and performance
Excellent resolution
In the OCRFlux-bench-single benchmark, this tool demonstrated excellent edit distance similarity (EDS). It showed an improvement of approximately 0.095 compared to olmOCR-7B-0225-preview, approximately 0.109 compared to Nanonets-OCR-s, and nearly 0.187 compared to MonkeyOCR. This leading advantage primarily stems from its deep optimization in parsing complex tables and handling cells spanning multiple rows and columns.
突破性的跨页合并能力
As the first open-source document parsing tool to support native cross-page table and paragraph merging, OCRFlux can automatically detect and integrate content distributed across multiple pages, ensuring the logical coherence of the document. Real-world testing shows that its cross-page merging recognition accuracy reaches as high as 98.3%.
Lightweight deployment and ultra-fast processing
模型参数量仅为 3B(30 亿),在 GTX 3090 GPU 上运行速度比 7B 参数的基线方案快约三倍,在保证高精度的前提下,极大地降低了部署门槛并提升了处理效率。
Feature Overview
- Full-scene analysis: It automatically recognizes the natural reading order and perfectly adapts to multi-column layouts and mixed text and image layouts.
- 复杂元素识别: It supports high-precision extraction of mathematical formulas and complex table structures.
- Intelligent content cleaning: Automatically filters redundant information such as headers and footers.
- Structured output: Paragraphs spanning multiple pages and tables are automatically merged, ensuring a clean and continuous Markdown result.
适用场景
得益于其对复杂布局的强大处理能力,OCRFlux 特别适用于以下内容密集型场景:科研论文数字化、企业复杂财务报表解析、技术标准文档转换 等。
快速体验与资源
用户可以通过在线 Demo 快速测试其解析效果,或通过 GitHub 仓库获取源码进行集成开发。
- 在线体验: https://ocrflux.pdfparser.io/
- GitHub 仓库: https://github.com/chatdoc-com/OCRFlux


