Document (PDF, Word, PPTX ...) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents. Remove PII. Convert any document or picture to structured JSON or Markdown
β3,102Dec 8, 2025Updated 5 months ago
Alternatives and similar repositories for text-extract-api
Users that are interested in text-extract-api are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- File Parser optimised for LLM Ingestion with no loss π§ Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.β7,369Feb 21, 2025Updated last year
- AI reads books: Page-by-Page PDF Knowledge Extractor & Summarizer. script performs an intelligent page-by-page analysis of PDF books, metβ¦β2,130Jan 20, 2025Updated last year
- OCR & Document Extraction using vision modelsβ12,233May 20, 2025Updated last year
- OCR, layout analysis, reading order, table recognition in 90+ languagesβ19,787Updated this week
- β2,287Mar 17, 2025Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer β’ AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Get your documents ready for gen AIβ60,372Updated this week
- Vision infrastructure to turn complex documents into RAG/LLM-ready dataβ2,942Apr 9, 2026Updated last month
- A Comprehensive Toolkit for High-Quality PDF Content Extractionβ9,682Jan 3, 2025Updated last year
- OpenSource Production ready Customer service with built in Evals and monitoringβ1,451Jan 12, 2026Updated 4 months ago
- ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.β1,544Aug 27, 2025Updated 9 months ago
- A system for agentic LLM-powered data processing and ETLβ3,754May 20, 2026Updated last week
- An open-source RAG-based tool for chatting with your documents.β25,394Apr 3, 2026Updated last month
- π₯ Open Source Browser API for AI Agents & Apps. Steel Browser is a batteries-included browser sandbox that lets you automate the web witβ¦β7,063May 22, 2026Updated last week
- Document to Markdown OCR library with Llama 3.2 visionβ2,425Jan 20, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Convert PDF to markdown + JSON quickly with high accuracyβ35,381May 5, 2026Updated 3 weeks ago
- PDF to markdown using vision LLMs β tables, layouts, and structure preserved