Ucas-HaoranWei / GOT-OCR2.0
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
☆5,879Updated this week
Related projects ⓘ
Alternatives and complementary repositories for GOT-OCR2.0
- 🔍 An LLM-based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT)☆5,076Updated this week
- A Comprehensive Toolkit for High-Quality PDF Content Extraction☆5,310Updated 2 weeks ago
- OCR, layout analysis, reading order, table recognition in 90+ languages☆13,808Updated this week
- A one-stop, open-source, high-quality data extraction tool, supports PDF/webpage/e-book extraction.一站式开源高质量数据提取工具,支持PDF/网页/多格式电子书提取。☆13,711Updated this week
- Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks☆5,347Updated this week
- A simple screen parsing tool towards pure vision based GUI agent☆4,323Updated this week
- Convert any URL to an LLM-friendly input with a simple prefix https://r.jina.ai/☆6,879Updated last week
- PDF to Markdown with vision models☆5,927Updated this week
- [CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型☆5,947Updated last week
- MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone☆12,497Updated 2 weeks ago
- Using GPT to parse PDF☆2,997Updated 3 months ago
- Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you ne…☆5,325Updated this week
- Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.☆2,983Updated last month
- Enhance Tesseract OCR output for scanned PDFs by applying Large Language Model (LLM) corrections.☆2,156Updated 2 months ago
- Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"☆6,676Updated this week
- Multilingual Voice Understanding Model☆3,349Updated 3 weeks ago
- A modular graph-based Retrieval-Augmented Generation (RAG) system☆18,831Updated this week
- GPT4V-level open-source multi-modal model based on Llama3-8B☆2,100Updated 2 months ago
- Real time interactive streaming digital human☆3,827Updated 2 weeks ago
- DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model☆3,574Updated last month
- ChatOllama is an open source chatbot based on LLMs. It supports a wide range of language models, and knowledge base management.☆2,643Updated 2 months ago
- Convert PDF to markdown quickly with high accuracy☆17,568Updated this week
- A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity…☆6,853Updated this week
- Question and Answer based on Anything.☆11,805Updated 2 weeks ago
- GraphRAG using Local LLMs - Features robust API and multiple apps for Indexing/Prompt Tuning/Query/Chat/Visualizing/Etc. This is meant to…☆1,695Updated 2 months ago
- Speech To Speech: an effort for an open-sourced and modular GPT4-o☆3,483Updated last week
- Get your documents ready for gen AI☆7,243Updated this week
- g1: Using Llama-3.1 70b on Groq to create o1-like reasoning chains☆3,854Updated last month
- 🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.☆18,297Updated this week
- Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.☆9,354Updated this week