bytedance / DolphinLinks
The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.
☆7,857Updated last month
Alternatives and similar repositories for Dolphin
Users that are interested in Dolphin are comparing it to the libraries listed below
Sorting:
- Multilingual Document Layout Parsing in a Single Vision-Language Model☆5,840Updated last month
- A Python library for extracting structured information from unstructured text using LLMs with precise source grounding and interactive vi…☆17,129Updated last week
- OCR model that handles complex tables, forms, handwriting with full layout.☆3,036Updated 2 weeks ago
- "RAG-Anything: All-in-One RAG Framework"☆10,653Updated last week
- 📑 PageIndex: Document Index for Reasoning-based RAG☆4,146Updated 2 weeks ago
- ☆2,072Updated 8 months ago
- ContextGem: Effortless LLM extraction from documents☆1,731Updated 3 weeks ago
- An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)☆1,809Updated 3 months ago
- Python library for Agentic Document Extraction from LandingAI☆2,299Updated 2 weeks ago
- Toolkit for linearizing PDFs for LLM datasets/training☆16,115Updated last week
- 🦛 CHONK docs with Chonkie ✨ — The lightweight ingestion library for fast, efficient and robust RAG pipelines☆3,326Updated this week
- The most accurate document search and store for building AI apps☆3,395Updated last week
- Colivara is a suite of services that allows you to store, search, and retrieve documents based on their visual embedding. ColiVara has st…☆1,387Updated 7 months ago
- Task-Aware Agent-driven Prompt Optimization Framework☆3,702Updated last month
- Embedding Atlas is a tool that provides interactive visualizations for large embeddings. It allows you to visualize, cross-filter, and se…☆4,402Updated 2 weeks ago
- Eigent: The World's First Multi-agent Workforce to Unlock Your Exceptional Productivity.☆2,493Updated this week
- Vision infrastructure to turn complex documents into RAG/LLM-ready data☆2,915Updated 2 months ago
- File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.☆7,244Updated 9 months ago
- A system for agentic LLM-powered data processing and ETL☆3,204Updated last week
- Build, enrich, and transform datasets using AI models with no code☆1,586Updated last month
- ☆2,197Updated last week
- Document (PDF, Word, PPTX ...) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents…☆2,942Updated 2 months ago
- ☆9,845Updated 3 months ago
- Open Source Deep Research Alternative to Reason and Search on Private Data. Written in Python.☆7,206Updated 2 weeks ago
- UltraRAG v2: A Low-Code MCP Framework for Building Complex and Innovative RAG Pipelines☆2,254Updated last week
- The absolute trainer to light up AI agents.☆9,436Updated this week
- AI Powered Knowledge Graph Generator☆1,387Updated 2 months ago
- Video-based AI memory library. Store millions of text chunks in MP4 files with lightning-fast semantic search. No database needed.☆10,474Updated last month
- A research prototype of a human-centered web agent☆8,367Updated last week
- Implementation of my RAG system that won all categories in Enterprise RAG Challenge 2☆1,985Updated 6 months ago