A polyglot document intelligence framework with a Rust core. Extract text, metadata, and structured information from PDFs, Office documents, images, and 76+ formats. Available for Rust, Python, Ruby, Java, Go, PHP, Elixir, C#, R, C, TypeScript (Node/Bun/Wasm/Deno)- or use via CLI, REST API, or MCP server.
β6,624Mar 6, 2026Updated this week
Alternatives and similar repositories for kreuzberg
Users that are interested in kreuzberg are comparing it to the libraries listed below
Sorting:
- Get your documents ready for gen AIβ54,754Updated this week
- A Python tool to visualize + enforce dependencies, using modular architecture π Open source π Installable via pip π§ Able to be adoptedβ¦β2,661Updated this week
- A reactive notebook for Python β run reproducible experiments, query with SQL, execute as a script, deploy as an app, and version with giβ¦β19,398Mar 2, 2026Updated last week
- Python tool for converting files and office documents to Markdown.β90,316Feb 20, 2026Updated 2 weeks ago
- PgQueuer is a Python library leveraging PostgreSQL for efficient job queuing.β1,441Updated this week
- β° Modern datetime library for Pythonβ2,305Feb 28, 2026Updated last week
- β893May 13, 2025Updated 9 months ago
- π‘ All-in-one AI framework for semantic search, LLM orchestration and language model workflowsβ12,247Feb 25, 2026Updated last week
- Toolkit for linearizing PDFs for LLM datasets/trainingβ16,979Updated this week
- A self-hosted API that takes a URL and returns a file with browser screenshots.β1,152Mar 9, 2025Updated 11 months ago
- OCR & Document Extraction using vision modelsβ12,155May 20, 2025Updated 9 months ago
- WebApps in pure Python. No JavaScript, HTML and CSS neededβ3,366Updated this week
- Convert PDF to markdown + JSON quickly with high accuracyβ32,069Mar 1, 2026Updated last week
- πͺ Run Background Tasks at Scaleβ6,664Updated this week
- Concurrent Python made simpleβ1,526Feb 4, 2025Updated last year
- A tool for Python developers to easily debug the HTTP(S) client and server requests in a Python program.β895Nov 23, 2025Updated 3 months ago
- An intuitive spreadsheet-like interface that lets users of all technical skill levels view, edit, query, and collaborate on Postgres dataβ¦β4,871Updated this week
- The most accurate document search and store for building AI appsβ3,529Feb 25, 2026Updated last week
- OCR, layout analysis, reading order, table recognition in 90+ languagesβ19,392Mar 1, 2026Updated last week
- Open-source developer platform to power your entire infra and turn scripts into webhooks, workflows and UIs. Fastest workflow engine (13xβ¦β15,948Updated this week
- The SOTA Open-Source Browser Agent for autonomously performing complex tasks on the webβ2,340Jun 9, 2025Updated 9 months ago
- Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing aβ¦β37,994Updated this week
- An open-source RAG-based tool for chatting with your documents.β25,168Mar 2, 2026Updated last week
- OpenSource Production ready Customer service with built in Evals and monitoringβ1,437Jan 12, 2026Updated last month
- FastOpenAPI is a library for generating and integrating OpenAPI schemas using Pydantic v2 and various frameworks (AioHttp, Django, Falconβ¦β497Feb 9, 2026Updated last month
- GenAI Agent Framework, the Pydantic wayβ15,256Updated this week
- Deep inspection of Python objectsβ1,936Jan 24, 2026Updated last month
- Lightpanda: the headless browser designed for AI and automationβ11,974Updated this week
- Build, run, manage agentic software at scale.β38,516Updated this week
- Create web-based user interfaces with Python. The nice way.β15,472Updated this week
- Pydoll is a library for automating chromium-based browsers without a WebDriver, offering realistic interactions.β6,638Updated this week
- CrawleeβA web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Dowβ¦β8,286Updated this week
- Summarize and query from a lot of heterogeneous documents. Any LLM provider, any filetype, advanced RAG, advanced summaries, scriptable, β¦β509Feb 25, 2026Updated last week
- SoTA production-ready AI retrieval system. Agentic Retrieval-Augmented Generation (RAG) with a RESTful API.β7,711Nov 7, 2025Updated 4 months ago
- A Python library for extracting structured information from unstructured text using LLMs with precise source grounding and interactive viβ¦β34,244Feb 25, 2026Updated last week
- Lightweight Durable Python Workflowsβ1,202Updated this week
- πΈοΈ Web apps in pure Python πβ28,187Feb 23, 2026Updated last week
- Structured Outputsβ13,488Mar 2, 2026Updated last week
- A web framework for building products with Python.β655Updated this week