VikParuchuri / texify
Math OCR model that outputs LaTeX and markdown
☆797Updated 2 months ago
Related projects: ⓘ
- Extract structured text from pdfs quickly☆292Updated last month
- Formula recognition based on LaTeX-OCR and ONNXRuntime.☆278Updated last week
- RAG (Retrieval-Augmented Generation) Chatbot Examples Using PyMuPDF☆282Updated this week
- TF-ID: Table/Figure IDentifier for academic papers☆209Updated 2 months ago
- Codebase for fine-tuning / evaluating nougat-based image2latex generation models☆115Updated 6 months ago
- TexTeller can convert image to latex formulas (image2latex, latex OCR) with higher accuracy and exhibits superior generalization ability,…☆304Updated last month
- UniTable: Towards a Unified Table Foundation Model☆340Updated 3 months ago
- Enhance Tesseract OCR output for scanned PDFs by applying Large Language Model (LLM) corrections.☆2,017Updated last month
- Lightweight, performant, deep table extraction☆257Updated this week
- HTML to Markdown converter and crawler.☆475Updated 8 months ago
- An Open-Source Python3 tool for recognizing layouts, tables, math formulas (LaTeX), and text in images, converting them into Markdown for…☆1,780Updated last month
- library supporting NLP and CV research on scientific papers☆672Updated 5 months ago
- Easily deployable 🚀 API to convert PDF to markdown quickly with high accuracy.☆701Updated 3 months ago
- UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition☆164Updated last week
- Effort to open-source NLLB checkpoints.☆415Updated 3 months ago
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.☆845Updated last week
- File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.☆501Updated last month
- An extremely fast implementation of whisper optimized for Apple Silicon using MLX.☆522Updated 4 months ago
- 🔍 Better text detection by combining multiple OCR engines (EasyOCR, Tesseract, and Pororo) with 🧠 LLM.☆467Updated 4 months ago
- LLM Analytics☆593Updated last month
- Fast lexical search library implementing BM25 in Python using Numpy and Scipy☆770Updated this week
- Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model☆3,389Updated this week
- Reaching LLaMA2 Performance with 0.1M Dollars☆957Updated last month
- Whisper with Medusa heads☆774Updated last week
- Zero shot pdf OCR with gpt-4o-mini☆1,392Updated last week
- High-performance retrieval engine for unstructured data☆778Updated this week
- Improved file parsing for LLM’s☆2,361Updated this week
- WikiChat is an improved RAG. It stops the hallucination of large language models by retrieving data from a corpus.☆976Updated 2 weeks ago
- ☆449Updated 5 months ago
- Convert all of libgen to high quality markdown☆240Updated 9 months ago