CatchTheTornado / text-extract-api
Document (PDF, Word, PPTX ...) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents. Remove PII. Convert any document or picture to structured JSON or Markdown
☆2,494Updated last month
Alternatives and similar repositories for text-extract-api:
Users that are interested in text-extract-api are comparing it to the libraries listed below
- AI reads books: Page-by-Page PDF Knowledge Extractor & Summarizer. script performs an intelligent page-by-page analysis of PDF books, met…☆1,430Updated 2 months ago
- ☆1,333Updated last week
- Vision infrastructure to turn complex documents into RAG/LLM-ready data☆2,017Updated this week
- Document to Markdown OCR library with Llama 3.2 vision☆2,224Updated 2 months ago
- 🔥 Open Source Browser API for AI Agents & Apps. Steel Browser is a batteries-included browser instance that lets you automate the web wi…☆4,003Updated last week
- A visual playground for agentic workflows: Iterate over your agents 10x faster☆3,736Updated this week
- Company Researcher tool helps you instantly understand any company inside out.☆1,134Updated last month
- Colivara is a suite of services that allows you to store, search, and retrieve documents based on their visual embedding. ColiVara has st…☆859Updated last month
- An AI personal tutor built with Llama 3.1☆1,814Updated 2 months ago
- 🦛 CHONK your texts with Chonkie ✨ - The no-nonsense RAG chunking library☆2,818Updated this week
- Easily deployable and scalable backend server that efficiently converts various document formats (pdf, docx, pptx, html, images, etc) int…☆462Updated 2 weeks ago
- File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.☆5,884Updated last month
- Official Firecrawl MCP Server - Adds powerful web scraping to Cursor, Claude and any other LLM clients.☆1,650Updated this week
- E2M converts various file types (doc, docx, epub, html, htm, url, pdf, ppt, pptx, mp3, m4a) into Markdown. It’s easy to install, with ded…☆1,040Updated 6 months ago
- A powerful coding assistant application that integrates with the DeepSeek API to process user conversations and generate structured JSON …☆1,429Updated last month
- A curated list of resources about AI agents for Computer Use, including research papers, projects, frameworks, and tools.☆1,101Updated last week
- A Model Context Protocol server for converting almost anything to Markdown☆981Updated 2 months ago
- Perplexity style AI Search engine clone built with Gemini 2.0 Flash and Grounding☆1,951Updated 2 months ago
- AI computer use powered by open source LLMs and E2B Desktop Sandbox☆944Updated last week
- An Open Source implementation of Notebook LM with more flexibility and features☆1,173Updated last week
- Enhance Tesseract OCR output for scanned PDFs by applying Large Language Model (LLM) corrections.☆2,563Updated 3 weeks ago
- NVIDIA Ingest is an early access set of microservices for parsing hundreds of thousands of complex, messy unstructured PDFs and other ent…☆2,604Updated this week
- A free and open source, self hosted Ai based live meeting note taker and minutes summary generator that can completely run in your Local …☆1,213Updated 2 weeks ago
- A Python package that makes it easy for developers to create AI apps powered by various AI providers.☆1,566Updated 2 weeks ago
- Task-Aware Agent-driven Prompt Optimization Framework☆3,002Updated this week
- ☆2,308Updated last week
- Awesome MCP Servers - A curated list of Model Context Protocol servers☆1,856Updated last week