datalab-to / markerLinks
Convert PDF to markdown + JSON quickly with high accuracy
☆31,421Updated last week
Alternatives and similar repositories for marker
Users that are interested in marker are comparing it to the libraries listed below
Sorting:
- OCR, layout analysis, reading order, table recognition in 90+ languages☆19,228Updated this week
- Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.☆53,776Updated last week
- Toolkit for linearizing PDFs for LLM datasets/training☆16,860Updated this week
- OCR & Document Extraction using vision models☆12,070Updated 8 months ago
- The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, No-code agent builder, MCP compatibility, and more.☆54,397Updated this week
- Convert any URL to an LLM-friendly input with a simple prefix https://r.jina.ai/☆9,748Updated 9 months ago
- A Comprehensive Toolkit for High-Quality PDF Content Extraction☆9,203Updated last year
- Get up and running with Kimi-K2.5, GLM-4.7, DeepSeek, gpt-oss, Qwen, Gemma and other models.☆162,082Updated this week
- A modular graph-based Retrieval-Augmented Generation (RAG) system☆30,872Updated this week
- Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing a…☆35,429Updated this week
- Get your documents ready for gen AI☆52,169Updated this week
- Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)☆67,023Updated this week
- User-friendly AI Interface (Supports Ollama, OpenAI API, ...)☆122,868Updated last week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆69,622Updated this week
- RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to creat…☆72,999Updated this week
- Production-ready platform for agentic workflow development.☆129,130Updated this week
- Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean…☆13,915Updated this week
- 🤖 Chat with your SQL database 📊. Accurate Text-to-SQL Generation via LLMs using Agentic Retrieval 🔄.☆22,632Updated last week
- Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks☆6,794Updated last month
- Python tool for converting files and office documents to Markdown.☆86,605Updated last month
- 🙌 OpenHands: AI-Driven Development☆67,509Updated this week
- Port of OpenAI's Whisper model in C/C++☆46,518Updated this week
- screenpipe turns your computer into a personal AI that knows everything you've done. record. search. automate. all local, all private, al…☆16,679Updated last week
- Faster Whisper transcription with CTranslate2☆20,833Updated 2 months ago
- DSPy: The framework for programming—not prompting—language models☆32,010Updated last week
- Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model☆8,080Updated last year
- A Gemini 2.5 Flash Level MLLM for Vision, Speech, and Full-Duplex Multimodal Live Streaming on Your Phone☆23,054Updated this week
- Implementation of Nougat Neural Optical Understanding for Academic Documents☆9,828Updated 11 months ago
- Build multi-agent systems that learn and improve with every interaction.☆37,691Updated this week
- Robust Speech Recognition via Large-Scale Weak Supervision☆94,315Updated last month