ArtifexSoftware / pdf2docxLinks
Open source Python library for converting PDF to DOCX.
☆3,184Updated 6 months ago
Alternatives and similar repositories for pdf2docx
Users that are interested in pdf2docx are comparing it to the libraries listed below
Sorting:
- 📄 Awesome OCR multiple programing languages toolkits based on ONNXRuntime, OpenVINO, PaddlePaddle and PyTorch.☆5,313Updated 2 weeks ago
- 60行代码实现多线程PDF转Word☆872Updated last year
- Collaboration with wangxupeng(https://github.com/wangxupeng)☆1,948Updated last year
- PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.☆8,554Updated this week
- pip install python-office 自动化办公专用库☆1,221Updated last week
- A python module that wraps the pdftoppm utility to convert PDF to PIL Image object☆1,911Updated last year
- CnOCR: Awesome Chinese/English OCR Python toolkits based on PyTorch. It comes with 20+ well-trained models for different application scen…☆3,694Updated 2 months ago
- Free Offline OCR 离线的中文文本检测+识别SDK☆1,371Updated 2 weeks ago
- 开源易用的中文离线OCR,识别率媲美大厂,并且提供了易用的web页面及web的接口,方便人类日常工作使用或者其他程序来调用~☆2,833Updated 2 years ago
- 文本盲水印:把信息隐匿到文本中,put invisible blind watermark into a text.☆1,760Updated 2 months ago
- CnSTD: 基于 PyTorch/MXNet 的 中文/英文 场景文字检测(Scene Text Detection)、数学公式检测(Mathematical Formula Detection, MFD)、篇章分析(Layout Analysis)的Python3 包☆770Updated 5 months ago
- OCR离线图片文字识别命令行windows程序,以JSON字符串形式输出结果,方便别的程序调用。提供各种语言API。由 PaddleOCR C++ 编译。☆1,379Updated 7 months ago
- [内测中]QPT - 致力于让开源项目更好通往互联网世界的Python to EXE工具(Python打包)。☆783Updated 10 months ago
- a machine learning image inpainting task that instinctively removes watermarks from image indistinguishable from the ground truth image☆4,191Updated 2 months ago
- Convert PDF to HTML without losing text or format.☆5,305Updated 4 months ago
- Create animated bar chart races in Python with matplotlib☆1,436Updated last year
- Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.☆9,153Updated 3 weeks ago
- 整理目前开源的最优表格识别模型,完善前后处理,模型转换为ONNX Organize the currently open-source optimal table recognition models, improve pre-processing and post…☆888Updated 3 months ago
- 超轻量级中文ocr,支持竖排文字识别, 支持ncnn、mnn、tnn推理 ( dbnet(1.8M) + crnn(2.5M) + anglenet(378KB)) 总模型仅4.7M☆12,239Updated 2 years ago
- Blind&Invisible Watermark ,图片盲水印,提取水印无须原图!☆11,387Updated 2 months ago
- mrdoc,online document system developed based on python. It is suitable for individuals and small teams to manage documents, wiki, knowled…☆3,153Updated last week
- 《PDF 解析》☆1,092Updated last year
- Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model☆8,022Updated 9 months ago
- Remove embedded watermarks and color stains for scanned PDF. 去除扫描版 PDF 中的水印☆188Updated 9 years ago
- A lightweight Python library for simulating Chinese handwriting☆2,202Updated last year
- Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables.☆59Updated last year
- Python bindings for WPS Office RPC (for Linux)☆272Updated 8 months ago
- 数据接口:百度、谷歌、头条、微博指数,宏观数据,利率数据,货币汇率,千里马、独角兽公司,新闻联播文字稿,影视票房数据,高校名单,疫情数据…☆2,564Updated 2 years ago
- 微信公众号文章的爬虫☆3,298Updated last year
- ChatLaw:A Powerful LLM Tailored for Chinese Legal. 中文法律大模型☆7,367Updated 10 months ago