yobix-ai / extractous
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
☆977Updated 2 months ago
Alternatives and similar repositories for extractous:
Users that are interested in extractous are comparing it to the libraries listed below
- Vision infrastructure to turn complex documents into RAG/LLM-ready data☆1,754Updated this week
- Detect and extract tables to markdown and csv☆729Updated last month
- An open-source OCR API that leverages OpenAI's powerful language models with optimized performance techniques like parallel processing an…☆831Updated 5 months ago
- 🦛 CHONK your texts with Chonkie ✨ - The no-nonsense RAG chunking library☆2,687Updated this week
- A lightweight task engine for building stateful AI agents that prioritizes simplicity and flexibility.☆907Updated 2 months ago
- pingcap/autoflow is a Graph RAG based and conversational knowledge base tool built with TiDB Serverless Vector Storage. Demo: https://tid…☆2,399Updated this week
- 🥤 RAGLite is a Python toolkit for Retrieval-Augmented Generation (RAG) with PostgreSQL or SQLite☆852Updated this week
- ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.☆1,099Updated last week
- A fast Rust based tool to serialize text-based files in a repository or directory for LLM consumption☆1,725Updated this week
- Enhance Tesseract OCR output for scanned PDFs by applying Large Language Model (LLM) corrections.☆2,518Updated this week
- High-performance retrieval engine for unstructured data☆1,197Updated this week
- Extract structured text from pdfs quickly☆427Updated this week
- A system for agentic LLM-powered data processing and ETL☆1,695Updated this week
- Lightweight, performant, deep table extraction☆422Updated this week
- ☆434Updated 5 months ago
- A realtime serving engine for Data-Intensive Generative AI Applications☆968Updated this week
- SeekStorm - sub-millisecond full-text search library & multi-tenancy server in Rust☆1,626Updated 2 weeks ago
- Colivara is a suite of services that allows you to store, search, and retrieve documents based on their visual embedding. ColiVara has st…☆789Updated 3 weeks ago
- Visualise your CSV files in seconds without sending your data anywhere☆501Updated last month
- Summarize and query from a lot of heterogeneous documents. Any LLM provider, any filetype, scalable (?), WIP☆382Updated this week
- NVIDIA Ingest is an early access set of microservices for parsing hundreds of thousands of complex, messy unstructured PDFs and other ent…☆2,568Updated this week
- Parse PDFs into markdown using Vision LLMs☆294Updated 3 weeks ago
- Scalable, fast, and disk-friendly vector search in Postgres, the successor of pgvecto.rs.☆457Updated last week
- The fast, Pythonic way to build Model Context Protocol servers 🚀☆1,136Updated 2 months ago
- Profile-Based Long-Term Memory for AI Applications☆796Updated this week
- Fast Semantic Text Deduplication☆546Updated this week