ArtifexSoftware / pdf2docx
Open source Python library for converting PDF to DOCX.
☆2,907Updated 2 weeks ago
Alternatives and similar repositories for pdf2docx:
Users that are interested in pdf2docx are comparing it to the libraries listed below
- A python module that wraps the pdftoppm utility to convert PDF to PIL Image object☆1,770Updated 9 months ago
- PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.☆7,089Updated last week
- A Python library for reading and writing PDF, powered by QPDF☆2,332Updated 2 weeks ago
- 《PDF 解析》☆1,040Updated 9 months ago
- 文本盲水印:把信息隐匿到文本中,put invisible blind watermark into a text.☆1,519Updated last month
- An Open-Source Python3 tool with SMALL models for recognizing layouts, tables, math formulas (LaTeX), and text in images, converting them…☆2,381Updated last week
- CnSTD: 基于 PyTorch/MXNet 的 中文/英文 场景文字检测(Scene Text Detection)、数学公式检测(Mathematical Formula Detection, MFD)、篇章分析(Layout Analysis)的Python3 包☆738Updated 2 months ago
- ☆709Updated last month
- 📄 Awesome OCR multiple programing languages toolkits based on ONNXRuntime, OpenVINO, PaddlePaddle and PyTorch.☆4,012Updated last week
- Blind&Invisible Watermark ,图片盲水印,提取水印无须原图!☆6,382Updated 10 months ago
- 基于PaddleOCR重构,并且脱离PaddlePaddle深度学习训练框架的轻量级OCR,推理速度超快 —— A lightweight OCR system based on PaddleOCR, decoupled from the PaddlePaddle d…☆967Updated last month
- Community maintained fork of pdfminer - we fathom PDF☆6,431Updated last week
- Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and …☆26,534Updated 7 months ago
- A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files☆9,007Updated last week
- Create Open XML PowerPoint documents in Python☆2,740Updated 9 months ago
- 为tkinter打造的可视化拖拽布局界面设计小工具☆654Updated 2 years ago
- CnOCR: Awesome Chinese/English OCR Python toolkits based on PyTorch. It comes with 20+ well-trained models for different application scen…☆3,530Updated 5 months ago
- Enhance Tesseract OCR output for scanned PDFs by applying Large Language Model (LLM) corrections.☆2,626Updated 2 months ago
- A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team…☆1,703Updated last month
- OCR离线图片文字识别命令行windows程序,以JSON字符串形式输出结果,方便别的程序调用。提供各种语言API。由 PaddleOCR C++ 编译。☆1,166Updated last month
- Tesseract documentation☆2,025Updated 3 months ago
- Using GPT to parse PDF☆3,396Updated 3 weeks ago
- 【间隙·树·排序算法】 对OCR结果或PDF提取的文本进行版面分析,按人类阅读顺序进行排序。☆131Updated last year
- a machine learning image inpainting task that instinctively removes watermarks from image indistinguishable from the ground truth image☆3,444Updated 8 months ago
- OCR, layout analysis, reading order, table recognition in 90+ languages☆17,307Updated last week
- Collaboration with wangxupeng(https://github.com/wangxupeng)☆1,888Updated 8 months ago
- pix2tex: Using a ViT to convert images of equations into LaTeX code.☆14,248Updated 3 months ago
- Best (most accurate) trained LSTM models.☆1,334Updated last year
- 整理目前开源的最优表格识别模型,完善前后处理,模型转换为ONNX Organize the currently open-source optimal table recognition models, improve pre-processing and post…☆665Updated last month
- rich text editor by canvas/svg☆4,140Updated this week