lumina-ai-inc/chunkr

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/lumina-ai-inc/chunkr)

lumina-ai-inc / chunkr

Vision infrastructure to turn complex documents into RAG/LLM-ready data

☆4,058

Alternatives and similar repositories for chunkr

Users that are interested in chunkr are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ucbepic / docetl
View on GitHub
A system for agentic LLM-powered data processing and ETL
☆3,951Jul 21, 2026Updated last week
allenai / olmocr
View on GitHub
Toolkit for linearizing PDFs for LLM datasets/training
☆19,207Mar 25, 2026Updated 4 months ago
docling-project / docling
View on GitHub
Get your documents ready for gen AI
☆63,895Updated this week
getomni-ai / zerox
View on GitHub
OCR & Document Extraction using vision models
☆12,258May 20, 2025Updated last year
datalab-to / surya
View on GitHub
OCR, layout analysis, reading order, table recognition in 90+ languages
☆21,167Updated this week
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
QuivrHQ / MegaParse
View on GitHub
File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.
☆7,410Feb 21, 2025Updated last year
agno-agi / agno
View on GitHub
Build, run, and manage agent platforms.
☆41,472Updated this week
datalab-to / marker
View on GitHub
Convert PDF to markdown + JSON quickly with high accuracy
☆37,970Jul 20, 2026Updated last week
google / langextract
View on GitHub
A Python library for extracting structured information from unstructured text using LLMs with precise source grounding and interactive vi…
☆37,911Updated this week
CatchTheTornado / text-extract-api
View on GitHub
Document (PDF, Word, PPTX ...) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents…
☆3,150Dec 8, 2025Updated 7 months ago
browserbase / stagehand
View on GitHub
The SDK For Browser Agents
☆23,659Updated this week
getzep / graphiti
View on GitHub
Build Real-Time Knowledge Graphs for AI Agents
☆29,290Updated this week
Cinnamon / kotaemon
View on GitHub
An open-source RAG-based tool for chatting with your documents.
☆25,663Jul 14, 2026Updated 2 weeks ago
bytedance / Dolphin
View on GitHub
The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.
☆9,041Mar 25, 2026Updated 4 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
katanemo / plano
View on GitHub
Plano is an AI-native proxy server and data plane for agentic apps. Smart LLM routing, observability, agent orchestration, and guardrails…
☆6,904Updated this week
microsoft / data-formulator
View on GitHub
🪄 Data Formulator is an interactive AI-powered data analysis system makes it easy to connect, explore and visualize data.
☆15,986Updated this week
steel-dev / steel-browser
View on GitHub
🔥 Open Source Browser API for AI Agents & Apps. Steel Browser is a batteries-included browser sandbox that lets you automate the web wit…
☆7,392Updated this week
morphik-org / morphik-core
View on GitHub
Open-source multimodal retrieval engine (Morphik Core). By Morphik — AI back office for skilled nursing & senior living (morphik.ai).
☆3,635Updated this week
unclecode / crawl4ai
View on GitHub
🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. Don't be shy, join here: https://discord.gg/jP8KfhDhyN
☆75,339Updated this week
shcherbak-ai / contextgem
View on GitHub
ContextGem: Effortless LLM extraction from documents
☆1,864Updated this week
mem0ai / mem0
View on GitHub
Universal memory layer for AI Agents
☆61,841Updated this week
stanford-oval / storm
View on GitHub
An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.
☆30,368Sep 30, 2025Updated 9 months ago
feyninc / chonkie
View on GitHub
🦛 CHONK docs with Chonkie ✨ — The lightweight ingestion library for fast, efficient and robust RAG pipelines
☆4,597Updated this week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
simstudioai / sim
View on GitHub
Build, deploy, and orchestrate AI agents. Sim is the central intelligence layer for your AI workforce.
☆29,232Updated this week
ItzCrazyKns / Vane
View on GitHub
Vane is an AI-powered answering engine.
☆35,896Apr 11, 2026Updated 3 months ago
cocoindex-io / cocoindex
View on GitHub
Incremental engine for long horizon agents 🌟 Star if you like it!
☆11,083Updated this week
stanfordnlp / dspy
View on GitHub
DSPy: The framework for programming—not prompting—language models
☆36,434Updated this week
onyx-dot-app / onyx
View on GitHub
Open Source AI Platform - AI Chat with advanced features that works with every LLM
☆31,236Updated this week
yifanfeng97 / Hyper-Extract
View on GitHub
Hypergraph is more powerful. Transform unstructured text into structured knowledge with LLMs. Graphs, hypergraphs, and spatio-temporal ex…
☆3,197Updated this week
unslothai / unsloth
View on GitHub
Unsloth is a local UI for training and running Gemma 4, Qwen3.6, DeepSeek, Kimi, GLM and other models.
☆68,965Updated this week
VectifyAI / PageIndex
View on GitHub
📑 PageIndex: Document Index for Vectorless, Reasoning-based RAG
☆34,793Updated this week
dottxt-ai / outlines
View on GitHub
Structured Outputs
☆15,419Updated this week
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
browser-use / browser-use
View on GitHub
🌐 Make websites accessible for AI agents. Automate tasks online with ease.
☆107,106Updated this week
zaidmukaddam / scira
View on GitHub
Scira (Formerly MiniPerplx) is a minimalistic AI-powered search engine that helps you find information on the internet and cites it too. …
☆11,815Mar 20, 2026Updated 4 months ago
aaif-goose / goose
View on GitHub
an open source, extensible AI agent that goes beyond code suggestions - install, execute, edit, and test with any LLM
☆51,826Updated this week
devflowinc / trieve
View on GitHub
All-in-one platform for search, recommendations, RAG, and analytics offered via API
☆2,698Jan 25, 2026Updated 6 months ago
airweave-ai / airweave
View on GitHub
Open-source context retrieval layer for AI agents
☆6,504Jun 5, 2026Updated last month
screenpipe / screenpipe
View on GitHub
YC (S26) | Record your screen 24/7 and plug into your agents. Local, private, secure. Connect to OpenClaw, Hermes agent and 100+ apps
☆20,590Updated this week
iii-hq / iii
View on GitHub
Effortlessly compose, extend, and observe every service in real-time for the first time ever.
☆18,527Updated this week