yobix-ai / extractousLinks
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
☆1,178Updated 6 months ago
Alternatives and similar repositories for extractous
Users that are interested in extractous are comparing it to the libraries listed below
Sorting:
- Vision infrastructure to turn complex documents into RAG/LLM-ready data☆2,269Updated last week
- Detect and extract tables to markdown and csv☆748Updated 5 months ago
- An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)☆1,470Updated 2 weeks ago
- A text extraction library supporting PDFs, images, office documents and more☆1,943Updated this week
- Split text into semantic chunks, up to a desired chunk size. Supports calculating length by characters and tokens, and is callable from R…☆451Updated last week
- A powerful document AI question-answering tool that connects to your local Ollama models. Create, manage, and interact with RAG systems f…☆1,058Updated last month
- Lightweight, performant, deep table extraction☆487Updated this week
- High-performance retrieval engine for unstructured data☆1,439Updated 3 weeks ago
- An open-source OCR API that leverages OpenAI's powerful language models with optimized performance techniques like parallel processing an…☆863Updated 9 months ago
- 🥤 RAGLite is a Python toolkit for Retrieval-Augmented Generation (RAG) with DuckDB or PostgreSQL☆1,033Updated 3 weeks ago
- Multi-modal OCR pipeline optimized for ML training (text, figure, math, tables, diagrams)☆656Updated last month
- Production-ready Inference, Ingestion and Indexing built in Rust 🦀☆647Updated 2 weeks ago
- pingcap/autoflow is a Graph RAG based and conversational knowledge base tool built with TiDB Serverless Vector Storage. Demo: https://tid…☆2,608Updated this week
- A realtime serving engine for Data-Intensive Generative AI Applications☆1,028Updated this week
- ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.☆1,298Updated last month
- SeekStorm - sub-millisecond full-text search library & multi-tenancy server in Rust☆1,702Updated last month
- Extract structured text from pdfs quickly☆509Updated last month
- NVIDIA Ingest is an early access set of microservices for parsing hundreds of thousands of complex, messy unstructured PDFs and other ent…☆2,701Updated this week
- Colivara is a suite of services that allows you to store, search, and retrieve documents based on their visual embedding. ColiVara has st…☆1,147Updated 2 months ago
- OCR Benchmark☆523Updated last month
- Enhance Tesseract OCR output for scanned PDFs by applying Large Language Model (LLM) corrections.☆2,688Updated 4 months ago
- 🦛 CHONK your texts with Chonkie ✨ — The no-nonsense RAG chunking library☆1,725Updated this week
- Easily deployable and scalable backend server that efficiently converts various document formats (pdf, docx, pptx, html, images, etc) int…☆633Updated 4 months ago
- ➖ Stripped down, stable version of firecrawl optimized for self-hosting and ease of contribution. Billing logic and AI features are compl…☆489Updated last month
- A system for agentic LLM-powered data processing and ETL☆2,340Updated this week
- Scalable, fast, and disk-friendly vector search in Postgres, the successor of pgvecto.rs.☆916Updated this week
- Improved file parsing for LLM’s☆3,013Updated 8 months ago
- All-in-one platform for search, recommendations, RAG, and analytics offered via API☆2,347Updated this week
- Self-hosted voice chat with LLMs☆432Updated 4 months ago
- The SOTA Open-Source Browser Agent for autonomously performing complex tasks on the web☆2,302Updated last month