yobix-ai / extractous
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
☆1,086Updated 4 months ago
Alternatives and similar repositories for extractous:
Users that are interested in extractous are comparing it to the libraries listed below
- Multi-modal OCR pipeline optimized for ML training (text, figure, math, tables, diagrams)☆621Updated last week
- A powerful document AI question-answering tool that connects to your local Ollama models. Create, manage, and interact with RAG systems f…☆979Updated last month
- Vision infrastructure to turn complex documents into RAG/LLM-ready data☆2,143Updated this week
- Detect and extract tables to markdown and csv☆743Updated 3 months ago
- An open-source OCR API that leverages OpenAI's powerful language models with optimized performance techniques like parallel processing an…☆852Updated 7 months ago
- A text extraction library supporting PDFs, images, office documents and more☆1,791Updated last week
- Open source multi-modal RAG for building AI apps over private knowledge.☆2,047Updated this week
- The SOTA Open-Source Browser Agent for autonomously performing complex tasks on the web☆2,085Updated this week
- OCR Benchmark☆470Updated 3 weeks ago
- SeekStorm - sub-millisecond full-text search library & multi-tenancy server in Rust☆1,673Updated this week
- Self-hosted voice chat with LLMs☆427Updated 2 months ago
- Enhance Tesseract OCR output for scanned PDFs by applying Large Language Model (LLM) corrections.☆2,626Updated 2 months ago
- git-like rag pipeline☆203Updated this week
- A fast Rust based tool to serialize text-based files in a repository or directory for LLM consumption☆2,017Updated 2 weeks ago
- 🥤 RAGLite is a Python toolkit for Retrieval-Augmented Generation (RAG) with PostgreSQL or SQLite☆926Updated this week
- ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.☆1,224Updated 2 weeks ago
- High-performance retrieval engine for unstructured data☆1,373Updated this week
- ETL framework to turn your data AI-ready - with realtime incremental updates and support custom logic like lego.☆984Updated this week
- 🦛 CHONK your texts with Chonkie ✨ - The no-nonsense chunking library☆356Updated this week
- Stateful load balancer custom-tailored for llama.cpp 🏓🦙☆747Updated last week
- Things you can do with the token embeddings of an LLM☆1,440Updated last month
- Fast Semantic Text Deduplication & Filtering☆659Updated 2 weeks ago
- ☆826Updated this week
- ☆1,490Updated last month
- Fast, streaming indexing, query, and agentic LLM applications in Rust☆463Updated this week
- The AI-native proxy server for agents. Arch handles the pesky low-level work in building agentic apps like calling specific tools, routin…☆2,531Updated last week
- Open-source unstructured data (PDFs, Images, Audiofiles) processing platform built for knowledge workers☆278Updated last month
- Minimal LLM inference in Rust☆983Updated 6 months ago
- Lightweight, performant, deep table extraction☆457Updated last week
- An experiment in meeting transcription and diarization with just an LLM. Maybe I went a little overboard though☆544Updated last month