datalab-to / suryaLinks
OCR, layout analysis, reading order, table recognition in 90+ languages
☆18,813Updated 2 weeks ago
Alternatives and similar repositories for surya
Users that are interested in surya are comparing it to the libraries listed below
Sorting:
- OCR & Document Extraction using vision models☆11,925Updated 5 months ago
- Convert PDF to markdown + JSON quickly with high accuracy☆29,658Updated last week
- Toolkit for linearizing PDFs for LLM datasets/training☆15,772Updated this week
- Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model☆7,988Updated 9 months ago
- A Comprehensive Toolkit for High-Quality PDF Content Extraction☆8,863Updated 10 months ago
- Enhance Tesseract OCR output for scanned PDFs by applying Large Language Model (LLM) corrections.☆2,768Updated 8 months ago
- An open-source RAG-based tool for chatting with your documents.☆24,597Updated 4 months ago
- Python scraper based on AI☆21,678Updated 2 weeks ago
- Convert any URL to an LLM-friendly input with a simple prefix https://r.jina.ai/☆9,355Updated 6 months ago
- Implementation of Nougat Neural Optical Understanding for Academic Documents☆9,696Updated 8 months ago
- File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.☆7,216Updated 8 months ago
- Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.☆48,227Updated this week
- Improved file parsing for LLM’s☆3,129Updated 11 months ago
- SOTA Open Source TTS☆23,988Updated last week
- Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks☆6,725Updated 4 months ago
- Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean…☆13,106Updated 3 weeks ago
- Automate browser based workflows with AI☆17,115Updated this week
- Structured data extraction and instruction calling with ML, LLM and Vision LLM☆5,031Updated last week
- Python tool for converting files and office documents to Markdown.☆82,554Updated 3 weeks ago
- Universal memory layer for AI Agents; Announcing OpenMemory MCP - local and secure memory management.☆42,816Updated this week
- Get your documents ready for gen AI☆43,221Updated this week
- Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sag…☆30,606Updated last week
- An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.☆27,579Updated last month
- A Repo For Document AI☆2,992Updated last week
- ⚡ Easiest no code web data extraction platform • Instantly turn any website into API or spreadsheet ⚡☆13,816Updated this week
- AI app store powered by 24/7 desktop history. open source | 100% local | dev friendly | 24/7 screen, mic recording☆15,899Updated 2 months ago
- Large Action Model framework to develop AI Web Agents☆6,190Updated 9 months ago
- No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documents☆5,921Updated this week
- Multi-agent framework, runtime and control plane. Built for speed, privacy, and scale.☆34,876Updated this week
- docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.☆5,581Updated last week