Dicklesworthstone / llm_aided_ocr
Enhance Tesseract OCR output for scanned PDFs by applying Large Language Model (LLM) corrections.
☆2,191Updated 3 months ago
Related projects ⓘ
Alternatives and complementary repositories for llm_aided_ocr
- Improved file parsing for LLM’s☆2,523Updated last week
- An open-source OCR API that leverages OpenAI's powerful language models with optimized performance techniques like parallel processing an…☆697Updated last month
- Vision model based document ingestion☆1,244Updated this week
- 🦛 CHONK your texts with Chonkie ✨ - The no-nonsense RAG chunking library☆1,477Updated this week
- PDF to Markdown with vision models☆6,519Updated this week
- Detect and extract tables to markdown and csv☆638Updated this week
- Open Source framework for voice and multimodal conversational AI☆3,391Updated this week
- 🪄 Create rich visualizations with AI☆1,337Updated 2 weeks ago
- High-performance retrieval engine for unstructured data☆987Updated this week
- LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve spee…☆2,581Updated last week
- Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model☆6,087Updated last week
- Things you can do with the token embeddings of an LLM☆1,378Updated last week
- Document to Markdown OCR library with Llama 3.2 vision☆1,443Updated last week
- Easy token price estimates for 400+ LLMs. TokenOps.☆1,468Updated this week
- WhisperFusion builds upon the capabilities of WhisperLive and WhisperSpeech to provide a seamless conversations with an AI.☆1,548Updated 3 months ago
- Local realtime voice AI☆1,946Updated this week
- This repo provides the server side code for llmsherpa API to connect. It includes parsers for various file formats.☆1,102Updated last month
- LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs☆1,501Updated 3 weeks ago
- The Open Source Memory Layer For Autonomous Agents☆1,489Updated last month
- Empowering RAG with a memory-based data interface for all-purpose applications!☆1,228Updated 2 weeks ago
- A framework for serving and evaluating LLM routers - save LLM costs without compromising quality!☆3,267Updated 3 months ago
- Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks☆5,677Updated 2 weeks ago
- Fast and accurate automatic speech recognition (ASR) for edge devices☆2,201Updated last week
- first base model for full-duplex conversational audio☆1,574Updated last week
- Containerized, state of the art Retrieval-Augmented Generation (RAG) system with a RESTful API☆3,662Updated this week
- A fast multimodal LLM for real-time voice☆1,366Updated this week
- RAG that intelligently adapts to your use case, data, and queries☆1,972Updated this week
- File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.☆686Updated this week
- Parse files for optimal RAG☆3,199Updated last week
- Build real-time multimodal AI applications 🤖🎙️📹☆4,032Updated this week