Goldziher / kreuzbergLinks
Document intelligence framework for Python - Extract text, metadata, and structured data from PDFs, images, Office documents, and more. Built on Pandoc, PDFium, and Tesseract.
☆2,369Updated this week
Alternatives and similar repositories for kreuzberg
Users that are interested in kreuzberg are comparing it to the libraries listed below
Sorting:
- ☆864Updated 4 months ago
- Concurrent Python made simple☆1,475Updated 7 months ago
- An open-source OCR API that leverages OpenAI's powerful language models with optimized performance techniques like parallel processing an…☆872Updated 11 months ago
- A powerful document AI question-answering tool that connects to your local Ollama models. Create, manage, and interact with RAG systems f…☆1,075Updated last month
- Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.☆1,418Updated 9 months ago
- Open-source platform for extracting structured data from documents using AI.☆1,415Updated 4 months ago
- High Performace IDE for Jupyter Notebooks☆2,228Updated last month
- A self-hosted API that takes a URL and returns a file with browser screenshots.☆1,045Updated 6 months ago
- ContextGem: Effortless LLM extraction from documents☆1,500Updated last week
- Multi-modal OCR pipeline optimized for ML training (text, figure, math, tables, diagrams)☆669Updated 4 months ago
- The SOTA Open-Source Browser Agent for autonomously performing complex tasks on the web☆2,316Updated 3 months ago
- Vision infrastructure to turn complex documents into RAG/LLM-ready data☆2,858Updated last month
- Detect and extract tables to markdown and csv☆750Updated 7 months ago
- Turn docstrings into LLM-functions☆502Updated 5 months ago
- A Python Library for Generating PDFs and Images from HTML, powered by PlutoBook☆926Updated last week
- Document (PDF, Word, PPTX ...) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents…☆2,877Updated last month
- CLI tool and python library to inspect databases fast.☆496Updated 3 months ago
- The most accurate document search and store for building AI apps☆3,235Updated this week
- 🦛 CHONK your texts with Chonkie ✨ — The no-nonsense RAG chunking library☆2,332Updated this week
- Visualise your CSV files in seconds without sending your data anywhere☆513Updated 3 months ago
- Things you can do with the token embeddings of an LLM☆1,447Updated 5 months ago
- ☆2,020Updated 6 months ago
- A superfast full-text search application☆1,129Updated last week
- Deep inspection of Python objects☆1,906Updated last month
- A web framework for building products with Python.☆625Updated this week
- Expose the contents of .docx files without leaving your terminal. Fast, safe, and smart — no Office required!☆2,913Updated this week
- A hub for various industry-specific schemas to be used with VLMs.☆533Updated 3 months ago
- Use LLMs in Excel formulas☆883Updated this week
- 🥤 RAGLite is a Python toolkit for Retrieval-Augmented Generation (RAG) with DuckDB or PostgreSQL☆1,076Updated this week
- A fast Rust based tool to serialize text-based files in a repository or directory for LLM consumption☆2,321Updated this week