yobix-ai / extractousLinks
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
☆1,228Updated 8 months ago
Alternatives and similar repositories for extractous
Users that are interested in extractous are comparing it to the libraries listed below
Sorting:
- Detect and extract tables to markdown and csv☆753Updated 7 months ago
- A powerful document AI question-answering tool that connects to your local Ollama models. Create, manage, and interact with RAG systems f…☆1,066Updated 3 weeks ago
- A realtime serving engine for Data-Intensive Generative AI Applications☆1,050Updated this week
- Vision infrastructure to turn complex documents into RAG/LLM-ready data☆2,796Updated last week
- Multi-modal OCR pipeline optimized for ML training (text, figure, math, tables, diagrams)☆665Updated 3 months ago
- Lightweight, performant, deep table extraction☆503Updated 3 weeks ago
- Document intelligence framework for Python - Extract text, metadata, and structured data from PDFs, images, Office documents, and more.…☆2,314Updated this week
- High-performance retrieval engine for unstructured data☆1,481Updated last month
- An open-source OCR API that leverages OpenAI's powerful language models with optimized performance techniques like parallel processing an…☆867Updated 11 months ago
- Highly Performant, Modular and Production-ready Inference, Ingestion and Indexing built in Rust 🦀☆698Updated last week
- Split text into semantic chunks, up to a desired chunk size. Supports calculating length by characters and tokens, and is callable from R…☆471Updated last week
- Extract structured text from pdfs quickly☆585Updated 2 months ago
- An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)☆1,665Updated this week
- SeekStorm - sub-millisecond full-text search library & multi-tenancy server in Rust☆1,729Updated this week
- OCR Benchmark☆553Updated 3 months ago
- Open-source LLMOps platform for hosting and scaling AI in your own infrastructure 🏓🦙☆1,232Updated last week
- Enhance Tesseract OCR output for scanned PDFs by applying Large Language Model (LLM) corrections.☆2,742Updated 6 months ago
- 🦛 CHONK your texts with Chonkie ✨ — The no-nonsense RAG chunking library☆2,076Updated this week
- Improved file parsing for LLM’s☆3,042Updated 9 months ago
- ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.☆1,388Updated this week
- Fast State-of-the-Art Static Embeddings☆1,807Updated 2 weeks ago
- pingcap/autoflow is a Graph RAG based and conversational knowledge base tool built with TiDB Serverless Vector Storage. Demo: https://tid…☆2,638Updated last month
- 📄🧠 PageIndex: Document Index for Reasoning-based RAG☆1,281Updated this week
- NeMo Retriever extraction is a scalable, performance-oriented document content and metadata extraction microservice. NeMo Retriever extra…☆2,733Updated this week
- ☆442Updated 11 months ago
- 🥤 RAGLite is a Python toolkit for Retrieval-Augmented Generation (RAG) with DuckDB or PostgreSQL☆1,056Updated this week
- Things you can do with the token embeddings of an LLM☆1,445Updated 5 months ago
- This repo provides the server side code for llmsherpa API to connect. It includes parsers for various file formats.☆1,263Updated 5 months ago
- The SOTA Open-Source Browser Agent for autonomously performing complex tasks on the web☆2,319Updated 2 months ago
- All-in-one platform for search, recommendations, RAG, and analytics offered via API☆2,445Updated 2 weeks ago