VikParuchuri / texify
Math OCR model that outputs LaTeX and markdown
☆907Updated 3 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for texify
- Formula recognition based on LaTeX-OCR and ONNXRuntime.☆304Updated 2 weeks ago
- TexTeller can convert image to latex formulas (image2latex, latex OCR) with higher accuracy and exhibits superior generalization ability,…☆342Updated 3 months ago
- Extract structured text from pdfs quickly☆340Updated 3 weeks ago
- Codebase for fine-tuning / evaluating nougat-based image2latex generation models☆124Updated last month
- Detect and extract tables to markdown and csv☆633Updated this week
- UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition☆206Updated last month
- Enhance Tesseract OCR output for scanned PDFs by applying Large Language Model (LLM) corrections.☆2,186Updated 3 months ago
- DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception☆497Updated 3 weeks ago
- TF-ID: Table/Figure IDentifier for academic papers☆222Updated 4 months ago
- An Open-Source Python3 tool for recognizing layouts, tables, math formulas (LaTeX), and text in images, converting them into Markdown for…☆1,947Updated this week
- UniTable: Towards a Unified Table Foundation Model☆377Updated 5 months ago
- Use late-interaction multi-modal models such as ColPali in just a few lines of code.☆617Updated last week
- Lightweight, performant, deep table extraction☆333Updated 3 weeks ago
- Recipes for shrinking, optimizing, customizing cutting edge vision models. 💜☆890Updated 2 months ago
- OpenResearcher, an advanced Scientific Research Assistant☆408Updated last month
- Easily deployable 🚀 API to convert PDF to markdown quickly with high accuracy.☆762Updated last month
- 🦛 CHONK your texts with Chonkie ✨ - The no-nonsense RAG chunking library☆1,415Updated this week
- nanoGPT style version of Llama 3.1☆1,246Updated 3 months ago
- This repo provides the server side code for llmsherpa API to connect. It includes parsers for various file formats.☆1,101Updated last month
- Vision model based document ingestion☆1,242Updated this week
- LLM Analytics☆615Updated last month
- File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.☆681Updated this week
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.☆1,095Updated last week
- Markdown rendering + Latex extras (equations, tables, ...), with conversion features, for the scientific community☆535Updated this week
- A Docker-powered service for PDF document layout analysis. This service provides a powerful and flexible PDF analysis service. The servic…☆176Updated last week
- RAG (Retrieval-Augmented Generation) Chatbot Examples Using PyMuPDF☆534Updated 2 weeks ago
- A High-efficiency Open-source Toolkit for Table-to-Latex Task☆150Updated 2 weeks ago
- NanoGPT (124M) quality in 7.8 8xH100-minutes☆1,033Updated this week
- An extremely fast implementation of whisper optimized for Apple Silicon using MLX.☆584Updated 6 months ago
- The code used to train and run inference with the ColPali architecture.☆1,132Updated this week