opendatalab / PDF-Extract-Kit
A Comprehensive Toolkit for High-Quality PDF Content Extraction
☆4,727Updated this week
Related projects: ⓘ
- Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model☆2,489Updated this week
- 🔍 An LLM-based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT)☆4,578Updated last week
- Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks☆4,988Updated 3 weeks ago
- A one-stop, open-source, high-quality data extraction tool, supports PDF/webpage/e-book extraction.一站式开源高质量数据提取工具,支持PDF/网页/多格式电子书提取。☆11,253Updated this week
- An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.☆10,156Updated last week
- OCR, layout analysis, reading order, line detection in 90+ languages☆9,849Updated this week
- Convert any URL to an LLM-friendly input with a simple prefix https://r.jina.ai/☆6,363Updated this week
- Convert PDF to markdown quickly with high accuracy☆16,438Updated last week
- Using GPT to parse PDF☆2,815Updated last month
- Build AI Assistants with memory, knowledge and tools.☆11,145Updated this week
- An open-source RAG-based tool for chatting with your documents.☆11,701Updated this week
- Transformer Explained Visually: Learn How LLM Transformer Models Work with Interactive Visualization☆2,580Updated 2 weeks ago
- Retrieval Augmented Generation (RAG) chatbot powered by Weaviate☆6,008Updated this week
- 🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.☆13,879Updated this week
- RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.☆17,176Updated this week
- Run PyTorch LLMs locally on servers, desktop and mobile☆3,100Updated this week
- A modular graph-based Retrieval-Augmented Generation (RAG) system☆17,179Updated this week
- 3D Visualization of an GPT-style LLM☆3,833Updated 3 weeks ago
- The easiest way to use Agentic RAG in any enterprise☆3,132Updated this week
- Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚☆6,777Updated this week
- RAG (Retrieval Augmented Generation) Framework for building modular, open source applications for production by TrueFoundry☆3,155Updated this week
- Enhance Tesseract OCR output for scanned PDFs by applying Large Language Model (LLM) corrections.☆1,999Updated 3 weeks ago
- Finetune Llama 3.1, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory☆15,611Updated this week
- DeepSeek Coder: Let the Code Write Itself☆6,530Updated 3 months ago
- 🔥🕷️ Crawl4AI: Open-source LLM Friendly Web Crawler & Scrapper☆2,763Updated last week
- Neo4j graph construction from unstructured data using LLMs☆2,072Updated this week
- Crawlee—A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Dow…☆3,832Updated this week
- official repository of aiXcoder-7B Code Large Language Model☆2,183Updated 3 weeks ago
- Ollama Python library☆3,912Updated this week
- Together Mixture-Of-Agents (MoA) – 65.1% on AlpacaEval with OSS models☆2,531Updated last month