>

文档智能解析方案总结进展更新（含ocr-pipline、layout+VLM+纯多模态端到端解析）

>

0

0



文档智能解析方案总结进展更新（含ocr-pipline、layout+VLM+纯多模态端到端解析）

文档智能解析方案总结进展更新（含ocr-pipline、layout+VLM+纯多模态端到端解析）

大模型自然语言处理

2025-11-13

5

最近又新增了很多文档解析的开源项目，现再更新一下进展。里面提到的很多模型技术方案都在《文档智能专栏》

OCR-Pipline式文档解析（layout+阅读顺序+ocr专家小模型）

MinerU1.x: https://github.com/opendatalab/MinerU
ppstructure: https://github.com/PaddlePaddle/PaddleOCR/blob/main/docs/version3.x/algorithm/PP-StructureV3/PP-StructureV3.md
Docling: https://github.com/docling-project/docling
Marker: https://github.com/VikParuchuri/marker

...

总结：ocr-pipline的可解释性强，更贴近落地解法，但泛化能力有限

Layout+VLM

MinerU2.5（1.2B）: https://github.com/opendatalab/MinerU
MonkeyOCR（1.2B~3B）：https://github.com/Yuliang-Liu/MonkeyOCR
PaddleOCR-VL（0.9B）：https://github.com/PaddlePaddle/PaddleOCR
chandra（8B）：https://github.com/datalab-to/chandra

这里面有些是传统的目标检测模型+VLM解析各部分内容，有些是检测+识别都一个模型干了。

多模态端到端的文档解析（finetune）

Dolphin: https://github.com/bytedance/Dolphin
olmOCR: https://github.com/allenai/olmocr
GOT-OCR: https://github.com/Ucas-HaoranWei/GOT-OCR2.0
SmolDocling: https://huggingface.co/ds4sd/SmolDocling-256M-preview
Unstructured: https://github.com/Unstructured-IO/unstructured
OpenParse: https://github.com/Filimoa/open-parse
Mistral-OCR: https://mistral.ai/news/mistral-ocr?utm_source=ai-bot.cn
Nougat: https://github.com/facebookresearch/nougat
DeepSeek-OCR：https://github.com/deepseek-ai/DeepSeek-OCR

...

通用多模态大模型代表

GPT4o
Gemini
Qwen2.5-VL-72B

...

【声明】内容源于网络

0

0

大模型自然语言处理

不定期分享自然语言处理、大语言模型，文档智能等领域前沿技术及实践。作者：老余，曾获CCF、Kaggle、ICPR、ICDAR等国内外近二十项算法竞赛/评测冠亚季军。曾发表sci、顶会等论文多篇。

内容 222

粉丝 0

大模型自然语言处理不定期分享自然语言处理、大语言模型，文档智能等领域前沿技术及实践。作者：老余，曾获CCF、Kaggle、ICPR、ICDAR等国内外近二十项算法竞赛/评测冠亚季军。曾发表sci、顶会等论文多篇。

总阅读219

粉丝0

内容222