AlibabaResearch / AdvancedLiterateMachinery
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
☆1,352Updated last week
Related projects: ⓘ
- mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding☆1,318Updated last week
- [ECCV 2024] Official code implementation of Vary: Scaling Up the Vision Vocabulary of Large Vision Language Models.☆1,742Updated 2 weeks ago
- GPT4V-level open-source multi-modal model based on Llama3-8B☆1,976Updated 2 weeks ago
- Netease Youdao's open-source embedding and reranker models for RAG products.☆1,367Updated last week
- Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.☆1,904Updated this week
- On the Hidden Mystery of OCR in Large Multimodal Models (OCRBench)☆434Updated 2 weeks ago
- Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model☆2,489Updated this week
- UniTable: Towards a Unified Table Foundation Model☆338Updated 3 months ago
- ☆1,704Updated 4 months ago
- ⚡FlashRAG: A Python Toolkit for Efficient RAG Research☆1,112Updated this week
- The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.☆4,786Updated last month
- Agent framework and applications built upon Qwen2, featuring Function Calling, Code Interpreter, RAG, and Chrome extension.☆3,093Updated 2 weeks ago
- This includes the original implementation of SELF-RAG: Learning to Retrieve, Generate and Critique through self-reflection by Akari Asai,…☆1,747Updated 3 months ago
- InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output☆2,449Updated 2 weeks ago
- An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)☆3,730Updated 3 weeks ago
- Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the o…☆2,193Updated 2 months ago
- ☆835Updated 2 months ago
- LMDeploy is a toolkit for compressing, deploying, and serving LLMs.☆4,184Updated this week
- Use PEFT or Full-parameter to finetune 300+ LLMs or 80+ MLLMs. (Qwen2, GLM4v, Internlm2.5, Yi, Llama3.1, Llava-Video, Internvl2, MiniCPM-…☆3,412Updated this week
- DeepSeek-VL: Towards Real-World Vision-Language Understanding☆2,007Updated 4 months ago
- Official code implementation of Vary-toy (Small Language Model Meets with Reinforced Vision Vocabulary)☆587Updated 2 weeks ago
- A curated list of resources dedicated to table recognition☆360Updated 7 months ago
- Retrieval and Retrieval-augmented LLMs☆6,824Updated this week
- OCR toolbox from Davar-Lab☆731Updated 10 months ago
- Collecting awesome papers of RAG for AIGC. We propose a taxonomy of RAG foundations, enhancements, and applications in paper "Retrieval-…☆1,114Updated 3 weeks ago
- [CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型☆5,491Updated last week
- 2nd solution of ICDAR 2021 Competition on Scientific Literature Parsing, Task B.☆424Updated 2 years ago
- 🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)☆794Updated 2 months ago
- A generalized information-seeking agent system with Large Language Models (LLMs).☆1,075Updated 3 months ago
- A lightweight framework for building LLM-based agents☆1,744Updated 3 weeks ago