yobix-ai / extractousLinks
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
☆1,628Updated 10 months ago
Alternatives and similar repositories for extractous
Users that are interested in extractous are comparing it to the libraries listed below
Sorting:
- Vision infrastructure to turn complex documents into RAG/LLM-ready data☆2,908Updated last month
- Detect and extract tables to markdown and csv☆755Updated 9 months ago
- Multi-modal OCR pipeline optimized for ML training (text, figure, math, tables, diagrams)☆678Updated 6 months ago
- A powerful document AI question-answering tool that connects to your local Ollama models. Create, manage, and interact with RAG systems f…☆1,084Updated 3 months ago
- 🦛 CHONK docs with Chonkie ✨ — The lightweight ingestion library for fast, efficient and robust RAG pipelines☆3,203Updated this week
- SeekStorm - sub-millisecond full-text search library & multi-tenancy server in Rust☆1,770Updated last week
- The most accurate document search and store for building AI apps☆3,369Updated this week
- An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)☆1,801Updated 2 months ago
- Document intelligence framework for Python - Extract text, metadata, and structured data from PDFs, images, Office documents, and more.…☆2,514Updated this week
- A realtime serving engine for Data-Intensive Generative AI Applications☆1,067Updated this week
- pingcap/autoflow is a Graph RAG based and conversational knowledge base tool built with TiDB Serverless Vector Storage. Demo: https://tid…☆2,680Updated last month
- ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.☆1,455Updated 2 months ago
- Highly Performant, Modular, Memory Safe and Production-ready Inference, Ingestion and Indexing built in Rust 🦀☆771Updated last week
- Extract structured text from pdfs quickly☆624Updated 5 months ago
- Easily deployable and scalable backend server that efficiently converts various document formats (pdf, docx, pptx, html, images, etc) int…☆724Updated 8 months ago
- High-performance retrieval engine for unstructured data☆1,525Updated last week
- ContextGem: Effortless LLM extraction from documents☆1,718Updated this week
- An open-source OCR API that leverages OpenAI's powerful language models with optimized performance techniques like parallel processing an…☆875Updated last year
- The SOTA Open-Source Browser Agent for autonomously performing complex tasks on the web☆2,321Updated 5 months ago
- Split text into semantic chunks, up to a desired chunk size. Supports calculating length by characters and tokens, and is callable from R…☆517Updated this week
- Rust implementation of DeepSeek-OCR with OpenAI-compatible server & CLI No Python environment needed - just download and run.☆1,880Updated this week
- Summarize and query from a lot of heterogeneous documents. Any LLM provider, any filetype, advanced RAG, advanced summaries, scriptable, …☆490Updated this week
- Lightweight, performant, deep table extraction☆515Updated 3 months ago
- Enhance Tesseract OCR output for scanned PDFs by applying Large Language Model (LLM) corrections.☆2,776Updated 8 months ago
- Improved file parsing for LLM’s☆3,135Updated last year
- Fast State-of-the-Art Static Embeddings☆1,900Updated last week
- NeMo Retriever extraction is a scalable, performance-oriented document content and metadata extraction microservice. NeMo Retriever extra…☆2,760Updated this week
- Open-source LLM load balancer and serving platform for self-hosting LLMs at scale 🏓🦙☆1,366Updated 3 weeks ago
- Parse PDFs into markdown using Vision LLMs☆442Updated last month
- self-hosted plaform for secure execution of untrusted user or AI-generated code☆4,025Updated this week