yobix-ai / extractousLinks
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
☆1,122Updated 5 months ago
Alternatives and similar repositories for extractous
Users that are interested in extractous are comparing it to the libraries listed below
Sorting:
- Vision infrastructure to turn complex documents into RAG/LLM-ready data☆2,182Updated this week
- Multi-modal OCR pipeline optimized for ML training (text, figure, math, tables, diagrams)☆645Updated last week
- An open-source OCR API that leverages OpenAI's powerful language models with optimized performance techniques like parallel processing an…☆854Updated 8 months ago
- Detect and extract tables to markdown and csv☆746Updated 4 months ago
- The SOTA Open-Source Browser Agent for autonomously performing complex tasks on the web☆2,195Updated last week
- A text extraction library supporting PDFs, images, office documents and more☆1,839Updated this week
- Production-ready Inference, Ingestion and Indexing built in Rust 🦀☆588Updated this week
- A powerful document AI question-answering tool that connects to your local Ollama models. Create, manage, and interact with RAG systems f…☆1,029Updated last week
- Extract structured text from pdfs quickly☆481Updated this week
- Colivara is a suite of services that allows you to store, search, and retrieve documents based on their visual embedding. ColiVara has st…☆1,117Updated last month
- ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.☆1,265Updated this week
- High-performance retrieval engine for unstructured data☆1,387Updated this week
- Lightweight, performant, deep table extraction☆463Updated last week
- Split text into semantic chunks, up to a desired chunk size. Supports calculating length by characters and tokens, and is callable from R…☆426Updated this week
- NVIDIA Ingest is an early access set of microservices for parsing hundreds of thousands of complex, messy unstructured PDFs and other ent…☆2,675Updated this week
- 🥤 RAGLite is a Python toolkit for Retrieval-Augmented Generation (RAG) with DuckDB or PostgreSQL☆976Updated this week
- Open source multi-modal RAG for building AI apps over private knowledge.☆2,478Updated this week
- A system for agentic LLM-powered data processing and ETL☆1,987Updated last week
- Browser automation system that uses AI-driven planning to navigate web pages and perform goals.☆771Updated 4 months ago
- OCR Benchmark☆495Updated this week
- Fast Semantic Text Deduplication & Filtering☆671Updated this week
- HelixDB is a powerful, open-source, graph-vector database built in Rust for intelligent data storage for RAG and AI.☆1,896Updated this week
- Fast State-of-the-Art Static Embeddings☆1,688Updated this week
- 📄 🧠 PageIndex: Document Index System for Reasoning-based RAG☆970Updated this week
- 🦛 CHONK your texts with Chonkie ✨ — The no-nonsense RAG chunking library☆1,025Updated this week
- The AI-native proxy server for agents. Arch handles the pesky low-level work in building agentic apps like calling specific tools, routin…☆2,641Updated this week
- A realtime serving engine for Data-Intensive Generative AI Applications☆1,007Updated this week
- Enhance Tesseract OCR output for scanned PDFs by applying Large Language Model (LLM) corrections.☆2,652Updated 3 months ago
- Fast, Accurate, Lightweight Python library to make State of the Art Embedding☆2,090Updated last week
- SeekStorm - sub-millisecond full-text search library & multi-tenancy server in Rust☆1,690Updated 2 weeks ago