kreuzberg-dev / kreuzbergLinks
A polyglot document intelligence framework with a Rust core. Extract text, metadata, and structured information from PDFs, Office documents, images, and 50+ formats. Available for Rust, Python, Ruby, Java, Go, PHP, Elixir, C#, TypeScript (Node/Bun/Wasm/Deno)- or use via CLI, REST API, or MCP server.
β5,548Updated this week
Alternatives and similar repositories for kreuzberg
Users that are interested in kreuzberg are comparing it to the libraries listed below
Sorting:
- Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.β1,670Updated last year
- π¦ CHONK docs with Chonkie β¨ β The lightweight ingestion library for fast, efficient and robust RAG pipelinesβ3,639Updated this week
- Document (PDF, Word, PPTX ...) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documentsβ¦β2,969Updated last month
- A fast Rust based tool to serialize text-based files in a repository or directory for LLM consumptionβ2,398Updated this week
- Concurrent Python made simpleβ1,518Updated 11 months ago
- High Performace IDE for Jupyter Notebooksβ2,283Updated last month
- Data transformation framework for AI. Ultra performant, with incremental processing. π Star if you like it!β5,929Updated this week
- Vision infrastructure to turn complex documents into RAG/LLM-ready dataβ2,932Updated 4 months ago
- SeekStorm - sub-millisecond full-text search library & multi-tenancy server in Rustβ1,813Updated this week
- ContextGem: Effortless LLM extraction from documentsβ1,762Updated last month
- Extract and convert data from any document, images, pdfs, word doc, ppt or URL into multiple formats (Markdown, JSON, CSV, HTML) with intβ¦β1,204Updated 2 months ago
- The most accurate document search and store for building AI appsβ3,456Updated this week
- Self Hosted Alternative To Google Analyticsβ1,986Updated 4 months ago
- An open, sub-millisecond, single-executable Firebase alternative with type-safe APIs, built-in WebAssembly runtime, realtime subscriptionβ¦β4,399Updated last week
- A powerful document AI question-answering tool that connects to your local Ollama models. Create, manage, and interact with RAG systems fβ¦β1,090Updated 5 months ago
- The SOTA Open-Source Browser Agent for autonomously performing complex tasks on the webβ2,333Updated 7 months ago
- Main engine of the IronCalc ecosystemβ3,562Updated this week
- NeMo Retriever extraction is a scalable, performance-oriented document content and metadata extraction microservice. NeMo Retriever extraβ¦β2,817Updated this week
- A Python tool to visualize + enforce dependencies, using modular architecture π Open source π Installable via pip π§ Able to be adoptedβ¦β2,619Updated this week
- Fast State-of-the-Art Static Embeddingsβ1,986Updated 3 weeks ago
- Expose the contents of .docx files without leaving your terminal. Fast, safe, and smart β no Office required!β3,423Updated last month
- WebApps in pure Python. No JavaScript, HTML and CSS neededβ3,342Updated last week
- opensource self-hosted sandboxes for ai agentsβ4,479Updated 2 weeks ago
- HelixDB is an open-source graph-vector database built from scratch in Rust.β3,701Updated this week
- A self-hosted API that takes a URL and returns a file with browser screenshots.β1,054Updated 10 months ago
- Open-source platform for extracting structured data from documents using AI.β1,462Updated 8 months ago
- PgQueuer is a Python library leveraging PostgreSQL for efficient job queuing.β1,426Updated last month
- Extract the main content from web pages.β3,103Updated last week
- Index your Gmail account to a SQLite DB and play with the data.β1,212Updated 7 months ago
- Detect and extract tables to markdown and csvβ754Updated last year