allenai / olmocrLinks
Toolkit for linearizing PDFs for LLM datasets/training
☆13,196Updated this week
Alternatives and similar repositories for olmocr
Users that are interested in olmocr are comparing it to the libraries listed below
Sorting:
- A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。☆38,345Updated this week
- Open Source Deep Research Alternative to Reason and Search on Private Data. Written in Python.☆6,490Updated this week
- OCR & Document Extraction using vision models☆11,514Updated last month
- A visual playground for agentic workflows: Iterate over your agents 10x faster☆5,266Updated this week
- File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.☆6,560Updated 4 months ago
- Convert PDF to markdown + JSON quickly with high accuracy☆26,360Updated this week
- The python library for real-time communication☆4,115Updated this week
- Fully local web research and report writing assistant☆7,772Updated 2 weeks ago
- A Comprehensive Toolkit for High-Quality PDF Content Extraction☆8,077Updated 6 months ago
- DeerFlow is a community-driven Deep Research framework, combining language models with tools like web search, crawling, and Python execut…☆14,951Updated this week
- 🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.☆42,389Updated this week
- A simple screen parsing tool towards pure vision based GUI agent☆22,605Updated 3 months ago
- KAG is a logical form-guided reasoning and retrieval framework based on OpenSPG engine and LLMs. It is used to build logical reasoning a…☆7,438Updated this week
- OCR, layout analysis, reading order, table recognition in 90+ languages☆17,767Updated this week
- Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model☆7,706Updated 5 months ago
- 🔥 No Code Web Data Extraction Platform. Open Source Alternative To Octoparse🔥☆13,210Updated this week
- 🚀 The fast, Pythonic way to build MCP servers and clients☆14,267Updated this week
- 🦉 OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation☆17,416Updated last week
- Get your documents ready for gen AI☆33,862Updated this week
- The Open All-in-One Multimodal AI Agent Stack connecting Cutting-edge AI Models and Agent Infra.☆15,126Updated this week
- 🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. Don't be shy, join here: https://discord.gg/jP8KfhDhyN☆47,392Updated this week
- 🪄 Create rich visualizations with AI☆12,676Updated this week
- Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks☆6,599Updated last month
- Suna - Open Source Generalist AI Agent☆16,501Updated this week
- No fortress, purely open ground. OpenManus is Coming.☆47,824Updated last week
- Python tool for converting files and office documents to Markdown.☆60,176Updated last month
- Yet Another Document Translator☆4,551Updated this week
- A video translation and dubbing tool powered by LLMs, offering professional-grade translations and one-click full-process deployment. It…☆8,016Updated this week
- Vision agent☆4,912Updated last week
- ☆4,279Updated this week