yigitkonur / swift-ocr-llm-powered-pdf-to-markdown
An open-source OCR API that leverages OpenAI's powerful language models with optimized performance techniques like parallel processing and batching to deliver high-quality text extraction from complex PDF documents. Ideal for businesses seeking efficient document digitization and data extraction solutions.
☆828Updated 4 months ago
Alternatives and similar repositories for swift-ocr-llm-powered-pdf-to-markdown:
Users that are interested in swift-ocr-llm-powered-pdf-to-markdown are comparing it to the libraries listed below
- Web scraper made for AI and simplicity in mind. It runs as a CLI that can be parallelized and outputs high-quality markdown content.☆510Updated last week
- Vision model based document ingestion☆1,647Updated this week
- A lightweight task engine for building stateful AI agents that prioritizes simplicity and flexibility.☆886Updated last month
- Open-source platform for extracting structured data from documents using AI.☆1,252Updated this week
- Open-source framework for exporting your personal data.☆1,416Updated last month
- Detect and extract tables to markdown and csv☆726Updated 3 weeks ago
- Omni SenseVoice: High-Speed Speech Recognition with words timestamps 🗣️🎯☆799Updated last month
- ☆434Updated 5 months ago
- Visualise your CSV files in seconds without sending your data anywhere☆493Updated last month
- ➖ Stripped down, stable version of firecrawl optimized for self-hosting and ease of contribution. Billing logic and AI features are compl…☆345Updated last week
- Summarize and query from a lot of heterogeneous documents. Any LLM provider, any filetype, scalable (?), WIP☆345Updated this week
- Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.☆946Updated 2 months ago
- A superfast full-text search application☆1,054Updated 2 months ago
- Create mind maps to learn new things using AI.☆536Updated 3 months ago
- DOM to Semantic-Markdown for use with LLMs☆755Updated 2 weeks ago
- Browser automation system that uses AI-driven planning to navigate web pages and perform goals.☆730Updated last month
- A fast Rust based tool to serialize text-based files in a repository or directory for LLM consumption☆1,610Updated this week
- 🥤 RAGLite is a Python toolkit for Retrieval-Augmented Generation (RAG) with PostgreSQL or SQLite☆807Updated this week
- An experiment in meeting transcription and diarization with just an LLM. Maybe I went a little overboard though☆393Updated last week
- Open-source unstructured data (PDFs, Images, Audiofiles) processing platform built for knowledge workers☆253Updated 2 weeks ago
- The only fully local production-grade Super SDK that provides a simple, unified, and powerful interface for calling more than 200+ LLMs.☆414Updated this week
- pingcap/autoflow is a Graph RAG based and conversational knowledge base tool built with TiDB Serverless Vector Storage. Demo: https://tid…☆2,274Updated this week
- ai for jq☆238Updated 5 months ago
- ☆279Updated 2 months ago
- 🦛 CHONK your texts with Chonkie ✨ - The no-nonsense RAG chunking library☆2,599Updated this week
- Attempt to create an Open Source Privacy Focused Rewind.ai Alternative for data capture☆197Updated 3 weeks ago
- RAG Logger is an open-source logging tool designed specifically for Retrieval-Augmented Generation (RAG) applications. It serves as a lig…☆217Updated last month