Goldziher / kreuzbergLinks
Document intelligence framework for Python - Extract text, metadata, and structured data from PDFs, images, Office documents, and more. Built on Pandoc, PDFium, and Tesseract.
☆2,483Updated this week
Alternatives and similar repositories for kreuzberg
Users that are interested in kreuzberg are comparing it to the libraries listed below
Sorting:
- A powerful document AI question-answering tool that connects to your local Ollama models. Create, manage, and interact with RAG systems f…☆1,081Updated 3 months ago
- Concurrent Python made simple☆1,506Updated 9 months ago
- ☆872Updated 5 months ago
- Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.☆1,616Updated 10 months ago
- Multi-modal OCR pipeline optimized for ML training (text, figure, math, tables, diagrams)☆676Updated 5 months ago
- A self-hosted API that takes a URL and returns a file with browser screenshots.☆1,047Updated 8 months ago
- Vision infrastructure to turn complex documents into RAG/LLM-ready data☆2,904Updated last month
- Detect and extract tables to markdown and csv☆754Updated 9 months ago
- ContextGem: Effortless LLM extraction from documents☆1,703Updated last month
- The SOTA Open-Source Browser Agent for autonomously performing complex tasks on the web☆2,320Updated 5 months ago
- An open-source OCR API that leverages OpenAI's powerful language models with optimized performance techniques like parallel processing an…☆874Updated last year
- High Performace IDE for Jupyter Notebooks☆2,263Updated last month
- A Python Library for Generating PDFs and Images from HTML, powered by PlutoBook☆995Updated last week
- The most accurate document search and store for building AI apps☆3,347Updated last week
- Open-source platform for extracting structured data from documents using AI.☆1,449Updated 5 months ago
- Runtime installer for Python applications☆1,838Updated last week
- 🦛 CHONK docs with Chonkie ✨ — The no-nonsense RAG library☆3,115Updated this week
- Integrate LLM in any pipeline - fit/predict pattern, JSON driven flows, and built in concurency support.☆606Updated 8 months ago
- ☆2,054Updated 7 months ago
- Visualise your CSV files in seconds without sending your data anywhere☆515Updated last month
- Document (PDF, Word, PPTX ...) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents…☆2,915Updated last month
- A web framework for building products with Python.☆647Updated this week
- 🥤 RAGLite is a Python toolkit for Retrieval-Augmented Generation (RAG) with DuckDB or PostgreSQL☆1,102Updated this week
- A fast Rust based tool to serialize text-based files in a repository or directory for LLM consumption☆2,357Updated 2 weeks ago
- Turn docstrings into LLM-functions☆506Updated 7 months ago
- Use LLMs in Excel formulas☆894Updated last week
- Deep inspection of Python objects☆1,916Updated 2 months ago
- Extract and convert data from any document, images, pdfs, word doc, ppt or URL into multiple formats (Markdown, JSON, CSV, HTML) with int…☆982Updated last week
- CLI tool and python library to inspect databases fast.☆496Updated 5 months ago
- Enhance Tesseract OCR output for scanned PDFs by applying Large Language Model (LLM) corrections.☆2,768Updated 8 months ago