ArtifexSoftware / pdf2docxLinks
Open source Python library for converting PDF to DOCX.
☆3,290Updated 8 months ago
Alternatives and similar repositories for pdf2docx
Users that are interested in pdf2docx are comparing it to the libraries listed below
Sorting:
- 📄 Awesome OCR multiple programing languages toolkits based on ONNXRuntime, OpenVINO, MNN, PaddlePaddle and PyTorch.☆5,916Updated this week
- Collaboration with wangxupeng(https://github.com/wangxupeng)☆1,963Updated last year
- 开源易用的中文离线OCR,识别率媲美大厂,并且提供了易用的web页面及web的接口,方便人类日常工作使用或者其他程序来调用~☆2,857Updated 2 years ago
- A python module that wraps the pdftoppm utility to convert PDF to PIL Image object☆1,933Updated last year
- OCR离线图片文字识别命令行windows程序,以JSON字符串形式输出结果,方便别的程序调用。提供各种语言API。由 PaddleOCR C++ 编译。☆1,427Updated 10 months ago
- CnSTD: 基于 PyTorch/MXNet 的 中文/英文 场景文字检测(Scene Text Detection)、数学公式检测(Mathematical Formula Detection, MFD)、篇章分析(Layout Analysis)的Python3 包☆781Updated 7 months ago
- Download Poppler binaries packaged for Windows with dependencies☆1,146Updated last month
- [内测中]QPT - 致力于让开源项目更好通往互联网世界的Python to EXE工具(Python打包)。☆789Updated this week
- Free Offline OCR 离线的中文文本检测+识别SDK☆1,373Updated 3 weeks ago
- Demos, examples and utilities using PyMuPDF☆706Updated last month
- Convert PDF to HTML without losing text or format.☆5,404Updated 6 months ago
- 文本盲水印:把信息隐匿到文本中,put invisible blind watermark into a text.☆1,806Updated 5 months ago
- An Open-Source Python3 tool with SMALL models for recognizing layouts, tables, math formulas (LaTeX), and text in images, converting them…☆3,001Updated this week
- A lightweight Python library for simulating Chinese handwriting☆2,218Updated last year
- CnOCR: Awesome Chinese/English OCR Python toolkits based on PyTorch. It comes with 20+ well-trained models for different application scen…☆3,728Updated 4 months ago
- 整理目前开源的最优表格识别模型,完善前后处理,模型转换为ONNX | Organize the currently open-source optimal table recognition models, improve pre-processing and post-…☆920Updated 6 months ago
- PDF补丁丁——PDF工具箱,可以编辑书签、剪裁旋转页面、解除限制、提取或合并文档,探查文档结构,提取图片、转成图片等等☆12,113Updated 2 weeks ago
- OCR离线图片文字识别命令行windows程序,以JSON字符串形式输出结果,方便别的程序调用。基于 RapidOcrOnnx 。☆324Updated 2 years ago
- img2table is a table identification and extraction Python Library for PDF and images, based on OpenCV image processing☆850Updated 3 months ago
- A Python library to extract tabular data from PDFs☆3,589Updated last week
- Using GPT to parse PDF☆3,562Updated 9 months ago
- 基于PaddleOCR重构,并且脱离PaddlePaddle深度学习训练框架的轻量级OCR,推理速度超快 —— A lightweight OCR system based on PaddleOCR, decoupled from the PaddlePaddle d…☆1,667Updated 3 months ago
- Python bindings for WPS Office RPC (for Linux)☆283Updated 10 months ago
- yolo3+ocr☆6,119Updated 3 years ago
- Blind&Invisible Watermark ,图片盲水印,提取水印无须原图!☆12,090Updated 5 months ago
- ☆881Updated 2 months ago
- Python bindings to PDFium, reasonably cross-platform.☆721Updated this week
- DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception☆1,981Updated 9 months ago
- Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具,输出json、srt字幕、纯文字格式☆4,254Updated 2 weeks ago
- 🎨 Regex visualizer & editor☆4,239Updated last month