An Open-Source Python3 tool with SMALL models for recognizing layouts, tables, math formulas (LaTeX), and text in images, converting them into Markdown format. A free alternative to Mathpix, empowering seamless conversion of visual content into text-based representations. 80+ languages are supported.
☆3,123Feb 7, 2026Updated 3 months ago
Alternatives and similar repositories for Pix2Text
Users that are interested in Pix2Text are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- pix2tex: Using a ViT to convert images of equations into LaTeX code.☆16,390Jan 18, 2025Updated last year
- 数学公式识别增强版:中英文手写印刷公式、支持初级符号推导(数据结构基于 LaTeX 抽象语法树)Math Formula OCR Pro, supports handwrite, Chinese-mixed formulas and simple symbol reaso…☆1,295Jun 11, 2024Updated last year
- TexTeller can convert image to latex formulas (image2latex, latex OCR) with higher accuracy and exhibits superior generalization ability,…☆742Aug 22, 2025Updated 8 months ago
- Formula recognition based on LaTeX-OCR and ONNXRuntime.☆385Nov 3, 2024Updated last year
- CnSTD: 基于 PyTorch/MXNet 的 中文/英文 场景文字检测(Scene Text Detection)、数学公式检测(Mathematical Formula Detection, MFD)、篇章分析(Layout Analysis)的Python3 包☆791May 1, 2026Updated 3 weeks ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- MixTeX multimodal LaTeX, ZhEn, and, Table OCR. It performs efficient CPU-based inference in a local offline on Windows.☆1,621Apr 24, 2025Updated last year
- Implementation of Nougat Neural Optical Understanding for Academic Documents☆9,974Feb 21, 2025Updated last year
- Math OCR model that outputs LaTeX and markdown☆1,119Jan 29, 2025Updated last year
- UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition☆476Sep 28, 2025Updated 7 months ago
- A Comprehensive Toolkit for High-Quality PDF Content Extraction☆9,669Jan 3, 2025Updated last year
- Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model☆8,120Feb 10, 2025Updated last year
- Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.☆63,345Updated this week
- Chinese Mathematical Formula Detection (MFD) Dataset 中文文档数学公式检测数据集☆34Dec 21, 2022Updated 3 years ago
- Convert PDF to markdown + JSON quickly with high accuracy☆35,144May 5, 2026Updated 2 weeks ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- [ECCV 2024] Official code implementation of Vary: Scaling Up the Vision Vocabulary of Large Vision Language Models.☆1,890Dec 30, 2024Updated last year
- A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team…☆1,828Mar 17, 2026Updated 2 months ago
- A High-efficiency Open-source Toolkit for Table-to-Latex Task☆277Dec 6, 2025Updated 5 months ago
- 数学公式识别 Math Formula OCR☆551Mar 24, 2023Updated 3 years ago
- Using GPT to parse PDF☆3,553Apr 17, 2025Updated last year
- DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception☆2,166Apr 14, 2025Updated last year
- OCR, layout analysis, reading order, table recognition in 90+ languages☆19,756May 6, 2026Updated 2 weeks ago
- Convert images of LaTex math equations into LaTex code.☆2,160Oct 4, 2022Updated 3 years ago
- Markdown rendering + Latex extras (equations, tables, ...), with conversion features, for the scientific community☆666May 12, 2026Updated last week
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- 1st Solution For ICDAR 2021 Competition on Mathematical Formula Detection(公式检测冠军方案)☆134Sep 4, 2023Updated 2 years ago
- CnOCR: Awesome Chinese/English OCR Python toolkits based on PyTorch. It comes with 20+ well-trained models for different application scen…☆3,752Feb 7, 2026Updated 3 months ago
- 360LayoutAnaylsis, a series Document Analysis Models and Datasets deleveped by 360 AI Research Institute☆307Sep 10, 2024Updated last year
- [EMNLP 2025 Demo] PDF scientific paper translation with preserved formats - 基于 AI 完整保留排版的 PDF 文档全文双语翻译,支持 Google/DeepL/Ollama/OpenAI 等服务,…☆33,828May 12, 2026Updated last week
- 为GPT/GLM等LLM大语言模型提供实用化交互接口,特别优化论文阅读/润色/写作体验,模块化设计,支持自定义快捷按钮&函数插件,支持Python和C++等项目剖析&自译解功能,PDF/LaTex论文翻译&总结功能,支持并行问询多种LLM模型,支持chatglm3等本地模型…☆70,677Jan 25, 2026Updated 3 months ago
- Codebase for fine-tuning / evaluating nougat-based image2latex generation models☆160Sep 25, 2024Updated last year
- 基于Pytorch实现的End-to-End图像Latex公式识别 inspire by LinXueyuanStdio/LaTeX_OCR_PRO☆179Apr 6, 2020Updated 6 years ago
- mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding☆2,406May 30, 2025Updated 11 months ago
- OCR software, free and offline. 开源、免费的离线OCR软件。支持截屏/批量导入图片,PDF文档识别,排除水印/页眉页脚,扫描/生成二维码。内置多国语言库。☆44,233Nov 20, 2025Updated 6 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- When Counting Meets HMER: Counting-Aware Network for Handwritten Mathematical Expression Recognition (ECCV’2022 Poster).☆387Aug 5, 2024Updated last year
- Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/…☆77,858May 14, 2026Updated last week
- LaTeX OCR 的数据仓库☆139Jun 11, 2024Updated last year
- Use ChatGPT to summarize the arXiv papers. 全流程加速科研,利用chatgpt进行论文全文总结+专业翻译+润色+审稿+审稿回复☆19,493Mar 2, 2026Updated 2 months ago
- translate scientific papers in latex, especially arxiv papers☆1,362Sep 26, 2025Updated 7 months ago
- 📄 Awesome OCR multiple programing languages toolkits based on ONNX Runtime, OpenVINO, MNN, PaddlePaddle, TensorRT and PyTorch.☆6,573May 6, 2026Updated 2 weeks ago
- FormulaNet is a new large-scale Mathematical Formula Detection dataset.☆21Nov 21, 2022Updated 3 years ago