yobix-ai / extractous
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
β682Updated last month
Alternatives and similar repositories for extractous:
Users that are interested in extractous are comparing it to the libraries listed below
- β715Updated 2 weeks ago
- π₯€ RAGLite is a Python toolkit for Retrieval-Augmented Generation (RAG) with PostgreSQL or SQLiteβ702Updated 2 weeks ago
- Detect and extract tables to markdown and csvβ723Updated this week
- Prompt optimization scratchβ574Updated this week
- Parse PDFs into markdown using Vision LLMsβ224Updated this week
- π¦ CHONK your texts with Chonkie β¨ - The no-nonsense RAG chunking libraryβ2,338Updated this week
- Vision model based document ingestionβ1,312Updated this week
- β432Updated 4 months ago
- A lightweight task engine for building stateful AI agents that prioritizes simplicity and flexibility.β878Updated 3 weeks ago
- Lightweight, performant, deep table extractionβ393Updated last month
- Your first AI prompt engineerβ360Updated 2 months ago
- An Open Source implementation of Notebook LM with more flexibility and featuresβ927Updated 2 months ago
- Browser automation system that uses AI-driven planning to navigate web pages and perform goals.β703Updated 2 weeks ago
- pingcap/autoflow is a Graph RAG based and conversational knowledge base tool built with TiDB Serverless Vector Storage. Demo: https://tidβ¦β2,229Updated this week
- Easily deployable π API to convert PDF to markdown quickly with high accuracy.β796Updated 3 months ago
- High-performance retrieval engine for unstructured dataβ1,128Updated 2 weeks ago
- Structured information extraction from documentsβ299Updated 4 months ago
- Fast Semantic Text Deduplicationβ472Updated this week
- β364Updated 2 months ago
- E2M converts various file types (doc, docx, epub, html, htm, url, pdf, ppt, pptx, mp3, m4a) into Markdown. Itβs easy to install, with dedβ¦β892Updated 4 months ago
- An experiment in meeting transcription and diarization with just an LLM. Maybe I went a little overboard thoughβ376Updated 2 months ago
- An open-source OCR API that leverages OpenAI's powerful language models with optimized performance techniques like parallel processing anβ¦β813Updated 4 months ago
- Excalidraw meets ComfyUI for LLMsβ221Updated last week
- Colivara is a suite of services that allows you to store, search, and retrieve documents based on their visual embedding. ColiVara has stβ¦β220Updated 2 weeks ago
- Serverless Modal + FastAPI + React + ColPali + Qdrant + GPT4o Vision RAG (V-RAG) Demoβ340Updated 2 months ago
- ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.β820Updated last week
- Visualise your CSV files in seconds without sending your data anywhereβ465Updated 3 weeks ago
- End-to-End Local-First Text-to-SQL Pipelinesβ271Updated last month
- Yet another open source Perplexityβ406Updated 3 months ago