datalab-to / suryaLinks
OCR, layout analysis, reading order, table recognition in 90+ languages
☆18,337Updated this week
Alternatives and similar repositories for surya
Users that are interested in surya are comparing it to the libraries listed below
Sorting:
- Convert PDF to markdown + JSON quickly with high accuracy☆27,942Updated this week
- OCR & Document Extraction using vision models☆11,772Updated 3 months ago
- Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model☆7,785Updated 6 months ago
- A Comprehensive Toolkit for High-Quality PDF Content Extraction☆8,375Updated 7 months ago
- Convert any URL to an LLM-friendly input with a simple prefix https://r.jina.ai/☆9,134Updated 3 months ago
- File Parser optimised for LLM Ingestion with no loss 🧠 Parse PDFs, Docx, PPTx in a format that is ideal for LLMs.☆7,094Updated 6 months ago
- Toolkit for linearizing PDFs for LLM datasets/training☆13,781Updated this week
- Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks☆6,666Updated 2 months ago
- An open-source RAG-based tool for chatting with your documents.☆22,945Updated last month
- A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。☆42,085Updated this week
- Improved file parsing for LLM’s☆3,044Updated 9 months ago
- Document (PDF, Word, PPTX ...) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents…☆2,786Updated 2 weeks ago
- Enhance Tesseract OCR output for scanned PDFs by applying Large Language Model (LLM) corrections.☆2,734Updated 5 months ago
- Automate browser-based workflows with LLMs and Computer Vision☆14,089Updated this week
- Python scraper based on AI☆21,077Updated last week
- 🔍 An LLM-based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT)☆6,530Updated last month
- We write your reusable computer vision tools. 💜☆33,966Updated this week
- Get your documents ready for gen AI☆36,287Updated this week
- 🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.☆48,597Updated this week
- AI app store powered by 24/7 desktop history. open source | 100% local | dev friendly | 24/7 screen, mic recording☆15,441Updated last week
- Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junio…☆9,299Updated 2 months ago
- Retrieval Augmented Generation (RAG) chatbot powered by Weaviate☆7,247Updated last month
- Structured data extraction and instruction calling with ML, LLM and Vision LLM☆4,954Updated last month
- tiny vision language model☆8,340Updated this week
- SOTA Open Source TTS☆22,694Updated 3 weeks ago
- Implementation of Nougat Neural Optical Understanding for Academic Documents☆9,583Updated 6 months ago
- Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean…☆12,368Updated last week
- Full-stack framework for building Multi-Agent Systems with memory, knowledge and reasoning.☆31,838Updated this week
- Perplexica is an AI-powered search engine. It is an Open source alternative to Perplexity AI☆23,600Updated last week
- A Repo For Document AI☆2,927Updated this week