VikParuchuri / texify
Math OCR model that outputs LaTeX and markdown
☆1,037Updated 2 months ago
Alternatives and similar repositories for texify:
Users that are interested in texify are comparing it to the libraries listed below
- Formula recognition based on LaTeX-OCR and ONNXRuntime.☆339Updated 5 months ago
- Codebase for fine-tuning / evaluating nougat-based image2latex generation models☆146Updated 6 months ago
- Markdown rendering + Latex extras (equations, tables, ...), with conversion features, for the scientific community☆587Updated this week
- Extract structured text from pdfs quickly☆452Updated last month
- TexTeller can convert image to latex formulas (image2latex, latex OCR) with higher accuracy and exhibits superior generalization ability,…☆504Updated this week
- Lightweight, performant, deep table extraction☆440Updated this week
- TF-ID: Table/Figure IDentifier for academic papers☆230Updated 8 months ago
- UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition☆294Updated last week
- DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception☆1,012Updated last week
- Detect and extract tables to markdown and csv☆734Updated 2 months ago
- Easily deployable 🚀 API to convert PDF to markdown quickly with high accuracy.☆829Updated 5 months ago
- Fast Semantic Text Deduplication☆597Updated last week
- UniTable: Towards a Unified Table Foundation Model☆452Updated 10 months ago
- A High-efficiency Open-source Toolkit for Table-to-Latex Task☆223Updated 3 months ago
- DocLayNet: A Large Human-Annotated Dataset for Document-Layout Analysis☆332Updated 2 years ago
- Large scale training of Latex formula recognition model, currently being organized and open source☆53Updated 11 months ago
- This repo provides the server side code for llmsherpa API to connect. It includes parsers for various file formats.☆1,203Updated this week
- A Docker-powered service for PDF document layout analysis. This service provides a powerful and flexible PDF analysis service. The servic…☆287Updated this week
- Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ☆827Updated 2 weeks ago
- Python bindings to PDFium☆552Updated 2 weeks ago
- Enhance Tesseract OCR output for scanned PDFs by applying Large Language Model (LLM) corrections.☆2,595Updated last month
- Python PDF parser for scientific publications: content and figures☆400Updated last year
- library supporting NLP and CV research on scientific papers☆754Updated 4 months ago
- WikiChat is an improved RAG. It stops the hallucination of large language models by retrieving data from a corpus.☆1,418Updated 2 months ago
- The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.☆1,662Updated this week
- Given a scholarly PDF, extract figures, tables, captions, and section titles.☆649Updated last year
- Use late-interaction multi-modal models such as ColPali in just a few lines of code.☆757Updated 2 months ago
- Improved file parsing for LLM’s☆2,888Updated 4 months ago
- Whisper with Medusa heads☆826Updated last month
- 🔍 Better text detection by combining multiple OCR engines (EasyOCR, Tesseract, and Pororo) with 🧠 LLM.☆538Updated 2 months ago