allenai / olmocrLinks
Toolkit for linearizing PDFs for LLM datasets/training
☆12,940Updated this week
Alternatives and similar repositories for olmocr
Users that are interested in olmocr are comparing it to the libraries listed below
Sorting:
- OCR & Document Extraction using vision models☆11,350Updated last month
- An open-source RAG-based tool for chatting with your documents.☆22,483Updated last week
- Convert any URL to an LLM-friendly input with a simple prefix https://r.jina.ai/☆8,862Updated last month
- 🦉 OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation☆17,060Updated last week
- Build Real-Time Knowledge Graphs for AI Agents☆11,432Updated this week
- The python library for real-time communication☆4,037Updated last week
- OCR, layout analysis, reading order, table recognition in 90+ languages☆17,641Updated last week
- Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.☆9,608Updated this week
- 🪄 Create rich visualizations with AI☆12,419Updated last week
- Full-stack framework for building Multi-Agent Systems with memory, knowledge and reasoning.☆28,467Updated this week
- 🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.☆40,227Updated this week
- Fully local web research and report writing assistant☆7,601Updated 2 months ago
- Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks☆6,586Updated last week
- 🚀 The fast, Pythonic way to build MCP servers and clients☆12,692Updated this week
- DeerFlow is a community-driven Deep Research framework, combining language models with tools like web search, crawling, and Python execut…☆13,606Updated this week
- A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。☆35,508Updated this week
- File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.☆6,517Updated 3 months ago
- Get your documents ready for gen AI☆31,854Updated last week
- A Comprehensive Toolkit for High-Quality PDF Content Extraction☆7,865Updated 5 months ago
- KAG is a logical form-guided reasoning and retrieval framework based on OpenSPG engine and LLMs. It is used to build logical reasoning a…☆7,239Updated this week
- A simple screen parsing tool towards pure vision based GUI agent☆22,426Updated 2 months ago
- The easiest tool for fine-tuning LLM models, synthetic data generation, and collaborating on datasets.☆3,782Updated this week
- Open Source Deep Research Alternative to Reason and Search on Private Data. Written in Python.☆6,274Updated 3 weeks ago
- Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train Qwen3, Llama 4, DeepSeek-R1, Gemma 3, TTS 2x faster with 70% less VRAM.☆40,545Updated last week
- No Code Web Data Extraction Platform • Turn Websites To APIs & Spreadsheets In Minutes☆13,043Updated this week
- Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sag…☆24,159Updated this week
- Python scraper based on AI☆20,007Updated this week
- SGLang is a fast serving framework for large language models and vision language models.☆15,276Updated this week
- Janus-Series: Unified Multimodal Understanding and Generation Models☆17,380Updated 4 months ago
- Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚☆28,527Updated 2 months ago