kreuzberg-dev/kreuzberg

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/kreuzberg-dev/kreuzberg)

kreuzberg-dev / kreuzberg

A polyglot document intelligence framework with a Rust core. Extract text, metadata, images, and structured information from PDFs, Office documents, images, and 97+ formats. Available for Rust, Python, Ruby, Java, Go, PHP, Elixir, C#, R, C, TypeScript (Node/Bun/Wasm/Deno)- or use via CLI, REST API, or MCP server.

☆8,486

Alternatives and similar repositories for kreuzberg

Users that are interested in kreuzberg are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

docling-project / docling
View on GitHub
Get your documents ready for gen AI
☆61,672Updated this week
microsoft / markitdown
View on GitHub
Python tool for converting files and office documents to Markdown.
☆152,866May 26, 2026Updated 3 weeks ago
marimo-team / marimo
View on GitHub
A reactive notebook for Python — run reproducible experiments, query with SQL, execute as a script, deploy as an app, and version with gi…
☆21,445Updated this week
PragmaticMachineLearning / probly
View on GitHub
☆896May 13, 2025Updated last year
allenai / olmocr
View on GitHub
Toolkit for linearizing PDFs for LLM datasets/training
☆17,387Mar 25, 2026Updated 2 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
janbjorge / pgqueuer
View on GitHub
PgQueuer is a Python library leveraging PostgreSQL for efficient job queuing.
☆1,491Jun 5, 2026Updated last week
neuml / txtai
View on GitHub
💡 All-in-one AI framework for semantic search, LLM orchestration and language model workflows
☆12,657Jun 11, 2026Updated last week
ariebovenberg / whenever
View on GitHub
⏰ Modern datetime library for Python
☆2,360Updated this week
datalab-to / marker
View on GitHub
Convert PDF to markdown + JSON quickly with high accuracy
☆36,101Jun 6, 2026Updated last week
tach-org / tach
View on GitHub
A Python tool to visualize + enforce dependencies, using modular architecture 🌎 Open source 🐍 Installable via pip 🔧 Able to be adopted…
☆2,751Jun 11, 2026Updated last week
getomni-ai / zerox
View on GitHub
OCR & Document Extraction using vision models
☆12,238May 20, 2025Updated last year
morphik-org / morphik-core
View on GitHub
The most accurate document search and store for building AI apps
☆3,610May 11, 2026Updated last month
pydantic / pydantic-ai
View on GitHub
AI Agent Framework, the Pydantic way
☆17,828Updated this week
datalab-to / surya
View on GitHub
OCR, layout analysis, reading order, table recognition in 90+ languages
☆20,840Updated this week
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
hatchet-dev / hatchet
View on GitHub
🪓 An orchestration engine for background tasks, AI agents, and durable workflows
☆7,362Updated this week
pyper-dev / pyper
View on GitHub
Concurrent Python made simple
☆1,520Feb 4, 2025Updated last year
goodreasonai / ScrapeServ
View on GitHub
A self-hosted API that takes a URL and returns a file with browser screenshots.
☆1,185Mar 9, 2025Updated last year
BerriAI / litellm
View on GitHub
Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing a…
☆50,785Updated this week
windmill-labs / windmill
View on GitHub
Open-source developer platform to power your entire infra and turn scripts into webhooks, workflows and UIs. Fastest workflow engine (13x…
☆16,766Updated this week
lmnr-ai / index
View on GitHub
The SOTA Open-Source Browser Agent for autonomously performing complex tasks on the web
☆2,348Jun 9, 2025Updated last year
rio-labs / rio
View on GitHub
WebApps in pure Python. No JavaScript, HTML and CSS needed
☆3,409Updated this week
Cinnamon / kotaemon
View on GitHub
An open-source RAG-based tool for chatting with your documents.
☆25,467Jun 9, 2026Updated last week
mr-fatalyst / fastopenapi
View on GitHub
FastOpenAPI is a library for generating and integrating OpenAPI schemas using Pydantic v2 and various frameworks (AioHttp, Django, Falcon…
☆507Mar 13, 2026Updated 3 months ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
GitHamza0206 / simba
View on GitHub
OpenSource Production ready Customer service with built in Evals and monitoring
☆1,451Jan 12, 2026Updated 5 months ago
google / langextract
View on GitHub
A Python library for extracting structured information from unstructured text using LLMs with precise source grounding and interactive vi…
☆36,898May 21, 2026Updated 3 weeks ago
mathesar-foundation / mathesar
View on GitHub
An intuitive spreadsheet-like interface that lets users of all technical skill levels view, edit, query, and collaborate on Postgres data…
☆5,002Updated this week
thiswillbeyourgithub / wdoc
View on GitHub
Summarize and query from a lot of heterogeneous documents. Any LLM provider, any filetype, advanced RAG, advanced summaries, scriptable, …
☆518May 30, 2026Updated 2 weeks ago
cle-b / httpdbg
View on GitHub
A tool for Python developers to easily debug the HTTP(S) client and server requests in a Python program.
☆905May 1, 2026Updated last month
agno-agi / agno
View on GitHub
Build, run, and manage agent platforms.
☆40,674Updated this week
astral-sh / uv
View on GitHub
An extremely fast Python package and project manager, written in Rust.
☆86,352Updated this week
SciPhi-AI / R2R
View on GitHub
SoTA production-ready AI retrieval system. Agentic Retrieval-Augmented Generation (RAG) with a RESTful API.
☆7,887Nov 7, 2025Updated 7 months ago
dottxt-ai / outlines
View on GitHub
Structured Outputs
☆13,964May 18, 2026Updated last month
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
zauberzeug / nicegui
View on GitHub
Create web-based user interfaces with Python. The nice way.
☆15,911Updated this week
autoscrape-labs / pydoll
View on GitHub
Pydoll is a library for automating chromium-based browsers without a WebDriver, offering realistic interactions.
☆6,906May 24, 2026Updated 3 weeks ago
astral-sh / ruff
View on GitHub
An extremely fast Python linter and code formatter, written in Rust.
☆48,037Updated this week
igrek51 / wat
View on GitHub
Deep inspection of Python objects
☆1,951Jan 24, 2026Updated 4 months ago
CatchTheTornado / text-extract-api
View on GitHub
Document (PDF, Word, PPTX ...) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents…
☆3,104Dec 8, 2025Updated 6 months ago
suitenumerique / docs
View on GitHub
A collaborative note taking, wiki and documentation platform that scales. Built with Django and React.
☆16,590Updated this week
QuivrHQ / MegaParse
View on GitHub
File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.
☆7,387Feb 21, 2025Updated last year