An Open-Source Python3 tool with SMALL models for recognizing layouts, tables, math formulas (LaTeX), and text in images, converting them into Markdown format. A free alternative to Mathpix, empowering seamless conversion of visual content into text-based representations. 80+ languages are supported.
☆3,148Feb 7, 2026Updated 4 months ago
Alternatives and similar repositories for Pix2Text
Users that are interested in Pix2Text are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- pix2tex: Using a ViT to convert images of equations into LaTeX code.☆16,453Jan 18, 2025Updated last year
- 数学公式识别增强版:中英文手写印刷公式、支持初级符号推导(数据结构基于 LaTeX 抽象语法树)Math Formula OCR Pro, supports handwrite, Chinese-mixed formulas and simple symbol reaso…☆1,300Jun 11, 2024Updated 2 years ago
- TexTeller can convert image to latex formulas (image2latex, latex OCR) with higher accuracy and exhibits superior generalization ability,…☆741Aug 22, 2025Updated 9 months ago
- Formula recognition based on LaTeX-OCR and ONNXRuntime.☆387Nov 3, 2024Updated last year
- CnSTD: 基于 PyTorch/MXNet 的 中文/英文 场景文字检测(Scene Text Detection)、数学公式检测(Mathematical Formula Detection, MFD)、篇章分析(Layout Analysis)的Python3 包☆792May 1, 2026Updated last month
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- MixTeX multimodal LaTeX, ZhEn, and, Table OCR. It performs efficient CPU-based inference in a local offline on Windows.☆1,624Apr 24, 2025Updated last year
- Implementation of Nougat Neural Optical Understanding for Academic Documents☆10,007Feb 21, 2025Updated last year
- Math OCR model that outputs LaTeX and markdown☆1,123Jan 29, 2025Updated last year
- UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition☆483Sep 28, 2025Updated 8 months ago
- A Comprehensive Toolkit for High-Quality PDF Content Extraction☆9,715Jan 3, 2025Updated last year
- Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model☆8,137Feb 10, 2025Updated last year
- Chinese Mathematical Formula Detection (MFD) Dataset 中文文档数学公式检测数据集☆34Dec 21, 2022Updated 3 years ago
- Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.☆66,927Updated this week
- Convert PDF to markdown + JSON quickly with high accuracy☆35,896Updated this week
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- [ECCV 2024] Official code implementation of Vary: Scaling Up the Vision Vocabulary of Large Vision Language Models.☆1,891Dec 30, 2024Updated last year
- A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team…☆1,829Mar 17, 2026Updated 2 months ago
- A High-efficiency Open-source Toolkit for Table-to-Latex Task☆277Dec 6, 2025Updated 6 months ago
- 数学公式识别 Math Formula OCR☆549Mar 24, 2023Updated 3 years ago
- Using GPT to parse PDF☆3,554Apr 17, 2025Updated last year
- DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception☆2,177Apr 14, 2025Updated last year
- OCR, layout analysis, reading order, table recognition in 90+ languages☆20,618Jun 2, 2026Updated last week
- Convert images of LaTex math equations into LaTex code.☆2,162Oct 4, 2022Updated 3 years ago
- Markdown rendering + Latex extras (equations, tables, ...), with conversion features, for the scientific community☆668May 26, 2026Updated 2 weeks ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- 1st Solution For ICDAR 2021 Competition on Mathematical Formula Detection(公式检测冠军方案)☆134Sep 4, 2023Updated 2 years ago
- CnOCR: Awesome Chinese/English OCR Python toolkits based on PyTorch. It comes with 20+ well-trained models for different application scen…☆3,753Feb 7, 2026Updated 4 months ago
- 360LayoutAnaylsis, a series Document Analysis Models and Datasets deleveped by 360 AI Research Institute☆306Sep 10, 2024Updated last year
- [EMNLP 2025 Demo] PDF scientific paper translation with preserved formats - 基于 AI 完整保留排版的 PDF 文档全文双语翻译,支持 Google/DeepL/Ollama/OpenAI 等服务,…☆34,707May 25, 2026Updated 2 weeks ago
- 为GPT/GLM等LLM大语言模型提供实用化交互接口,特别优化论文阅读/润色/写作体验,模块化设计,支持自定义快捷按钮&函数插件,支持Python和C++等项目剖析&自译解功能,PDF/LaTex论文翻译&总结功能,支持并行问询多种LLM模型,支持chatglm3等本地模型…☆70,864Jan 25, 2026Updated 4 months ago
- Codebase for fine-tuning / evaluating nougat-based image2latex generation models☆160Sep 25, 2024Updated last year
- 基于Pytorch实现的End-to-End图像Latex公式识别 inspire by LinXueyuanStdio/LaTeX_OCR_PRO☆179Apr 6, 2020Updated 6 years ago
- mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding☆2,406May 30, 2025Updated last year
- OCR software, free and offline. 开源、免费的离线OCR软件。支持截屏/批量导入图片,PDF文档识别,排除水印/页眉页脚,扫描/生成二维码。内置多国语言库。☆44,980Nov 20, 2025Updated 6 months ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- When Counting Meets HMER: Counting-Aware Network for Handwritten Mathematical Expression Recognition (ECCV’2022 Poster).☆387Aug 5, 2024Updated last year
- Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/…☆81,097Jun 4, 2026Updated last week
- LaTeX OCR 的数据仓库☆140Jun 11, 2024Updated 2 years ago
- Use ChatGPT to summarize the arXiv papers. 全流程加速科研,利用chatgpt进行论文全文总结+专业翻译+润色+审稿+审稿回复☆19,571Mar 2, 2026Updated 3 months ago
- translate scientific papers in latex, especially arxiv papers☆1,362Sep 26, 2025Updated 8 months ago
- 📄 Awesome OCR multiple programing languages toolkits based on ONNX Runtime, OpenVINO, MNN, PaddlePaddle, TensorRT and PyTorch.☆6,770May 22, 2026Updated 2 weeks ago
- FormulaNet is a new large-scale Mathematical Formula Detection dataset.☆21Nov 21, 2022Updated 3 years ago