VikParuchuri / texify
Math OCR model that outputs LaTeX and markdown
☆1,041Updated 2 months ago
Alternatives and similar repositories for texify:
Users that are interested in texify are comparing it to the libraries listed below
- Formula recognition based on LaTeX-OCR and ONNXRuntime.☆343Updated 5 months ago
- Extract structured text from pdfs quickly☆469Updated last month
- TexTeller can convert image to latex formulas (image2latex, latex OCR) with higher accuracy and exhibits superior generalization ability,…☆521Updated this week
- Codebase for fine-tuning / evaluating nougat-based image2latex generation models☆146Updated 6 months ago
- An Open-Source Python3 tool with SMALL models for recognizing layouts, tables, math formulas (LaTeX), and text in images, converting them…☆2,344Updated last week
- Markdown rendering + Latex extras (equations, tables, ...), with conversion features, for the scientific community☆593Updated 2 weeks ago
- Detect and extract tables to markdown and csv☆742Updated 3 months ago
- UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition☆310Updated last month
- UniTable: Towards a Unified Table Foundation Model☆461Updated 10 months ago
- DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception☆1,133Updated last week
- A Docker-powered service for PDF document layout analysis. This service provides a powerful and flexible PDF analysis service. The servic…☆480Updated 3 weeks ago
- TF-ID: Table/Figure IDentifier for academic papers☆232Updated 9 months ago
- Lightweight, performant, deep table extraction☆453Updated 3 weeks ago
- 🔍 Better text detection by combining multiple OCR engines (EasyOCR, Tesseract, and Pororo) with 🧠 LLM.☆538Updated 2 months ago
- A Faster LayoutReader Model based on LayoutLMv3, Sort OCR bboxes to reading order.☆210Updated 11 months ago
- Easily deployable 🚀 API to convert PDF to markdown quickly with high accuracy.☆839Updated 6 months ago
- A High-efficiency Open-source Toolkit for Table-to-Latex Task☆231Updated 4 months ago
- Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ☆1,003Updated 2 weeks ago
- LaTeXML: a TeX and LaTeX to XML/HTML/ePub/MathML translator.☆1,063Updated this week
- A python wrapper for the Doc2X API and comes with native texts processing (to improve PDF recall in RAG). | Doc2X API的python封装,同时附带本地的文本处…☆256Updated 2 months ago
- [CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation☆368Updated 2 weeks ago
- Implementation of Nougat Neural Optical Understanding for Academic Documents☆9,415Updated 2 months ago
- Enhance Tesseract OCR output for scanned PDFs by applying Large Language Model (LLM) corrections.☆2,623Updated last month
- DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis☆336Updated 2 years ago
- Large scale training of Latex formula recognition model, currently being organized and open source☆52Updated last year
- Whisper with Medusa heads☆831Updated this week
- Convert any PDF into a podcast episode!☆728Updated last month
- YOLO models trained by DocLayNet - power your Document Intelligent by Layout Analysis☆100Updated last month
- Python bindings to PDFium☆560Updated this week
- Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipy☆1,120Updated last week