VikParuchuri / texifyLinks
Math OCR model that outputs LaTeX and markdown
☆1,102Updated 10 months ago
Alternatives and similar repositories for texify
Users that are interested in texify are comparing it to the libraries listed below
Sorting:
- Formula recognition based on LaTeX-OCR and ONNXRuntime.☆373Updated last year
- Extract structured text from pdfs quickly☆632Updated 6 months ago
- TexTeller can convert image to latex formulas (image2latex, latex OCR) with higher accuracy and exhibits superior generalization ability,…☆647Updated 3 months ago
- Codebase for fine-tuning / evaluating nougat-based image2latex generation models☆159Updated last year
- An Open-Source Python3 tool with SMALL models for recognizing layouts, tables, math formulas (LaTeX), and text in images, converting them…☆2,691Updated 4 months ago
- Markdown rendering + Latex extras (equations, tables, ...), with conversion features, for the scientific community☆640Updated this week
- UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition☆435Updated 2 months ago
- Lightweight, performant, deep table extraction☆518Updated 4 months ago
- Detect and extract tables to markdown and csv☆755Updated 10 months ago
- UniTable: Towards a Unified Table Foundation Model☆514Updated last year
- TF-ID: Table/Figure IDentifier for academic papers☆241Updated last year
- DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception☆1,822Updated 7 months ago
- img2table is a table identification and extraction Python Library for PDF and images, based on OpenCV image processing☆833Updated last month
- library supporting NLP and CV research on scientific papers☆783Updated last year
- A High-efficiency Open-source Toolkit for Table-to-Latex Task☆272Updated last week
- Parse PDFs into markdown using Vision LLMs☆452Updated 2 months ago
- Enhance Tesseract OCR output for scanned PDFs by applying Large Language Model (LLM) corrections.☆2,787Updated 9 months ago
- 🔍 Better text detection by combining multiple OCR engines (EasyOCR, Tesseract, and Pororo) with 🧠 LLM.☆595Updated 6 months ago
- A Faster LayoutReader Model based on LayoutLMv3, Sort OCR bboxes to reading order.☆288Updated 3 months ago
- Large scale training of Latex formula recognition model, currently being organized and open source☆56Updated last year
- [CVPR 2025] A Comprehensive Benchmark for Document Parsing and Evaluation☆1,268Updated last week
- PyMuPDF4LLM☆1,160Updated last week
- YOLO models trained by DocLayNet - power your Document Intelligent by Layout Analysis☆142Updated 4 months ago
- WikiChat is an improved RAG. It stops the hallucination of large language models by retrieving data from a corpus.☆1,536Updated 7 months ago
- mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding☆2,261Updated 6 months ago
- LaTeXML: a TeX and LaTeX to XML/HTML/ePub/MathML translator.☆1,184Updated this week
- Effort to open-source NLLB checkpoints.☆467Updated last year
- Benchmarking PDF libraries☆315Updated 5 months ago
- [CVPR 2024] DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks☆530Updated 4 months ago
- Python bindings to PDFium, reasonably cross-platform.☆689Updated this week