Latest Advances on Reasoning of Multimodal Large Language Models (Multimodal R1 \ Visual R1) ) 🍓
☆36Apr 3, 2025Updated last year
Alternatives and similar repositories for Awesome-MLLM-Reasoning
Users that are interested in Awesome-MLLM-Reasoning are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- R1-Vision: Let's first take a look at the image☆48Feb 16, 2025Updated last year
- A benchmark for the task of translation suggestion☆60Jun 23, 2022Updated 3 years ago
- ☆23Nov 19, 2024Updated last year
- The github repository of paper "Understanding Differential Search Index for Text Retrieval" in ACL2023 Findings..☆16May 21, 2023Updated 3 years ago
- 如需体验textin文档解析,请点击https://cc.co/16YSIy☆22Jul 9, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- [EMNLP 2025] Distill Visual Chart Reasoning Ability from LLMs to MLLMs☆61Aug 25, 2025Updated 9 months ago
- [NAACL 2025 Main Selected Oral] Repository for the paper: Prompt Compression for Large Language Models: A Survey☆37May 18, 2025Updated last year
- Cost-Sensitive Toolpath Agent for Multi-turn Image Editing☆31Mar 26, 2025Updated last year
- ☆37Jun 28, 2021Updated 4 years ago
- Latest Advances on System-2 Reasoning☆1,351Jun 8, 2025Updated last year
- [CVPR 2026] MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources☆218Sep 26, 2025Updated 8 months ago
- ☆25Mar 17, 2026Updated 2 months ago
- Recent Advances in Visual Dialog☆28Aug 19, 2022Updated 3 years ago
- KG-Rank: Enhancing Large Language Models for Medical QA with Knowledge Graphs and Ranking Techniques☆51Dec 9, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence☆10Mar 2, 2025Updated last year
- Implementation of CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning☆48Nov 8, 2023Updated 2 years ago
- A fork to add multimodal model training to open-r1☆1,566Feb 8, 2025Updated last year
- Extend OpenRLHF to support LMM RL training for reproduction of DeepSeek-R1 on multimodal tasks.☆845May 14, 2025Updated last year
- [NeurIPS 2024] Efficiency for Free: Ideal Data Are Transportable Representations☆19Jan 19, 2025Updated last year
- Awesome Entity Alignment is a collection of EA techniques, including papers, codes, and datasets.☆11Oct 27, 2022Updated 3 years ago
- [INTERSPEECH 2023] Knowledge Transfer from Pre-trained Language Models to Cif-based Recognizers via Hierarchical Distillation☆41Sep 1, 2023Updated 2 years ago
- Combined InstantID🔥 and FouriScale to generate high resolution image!☆11Apr 3, 2024Updated 2 years ago
- 电子病历结构化解析☆13May 11, 2022Updated 4 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Ship remote sensing dataset☆12Jun 28, 2022Updated 3 years ago
- [CVPR2026] BinaryAttention: One-Bit QK-Attention for Vision and Diffusion Transformers☆35Mar 17, 2026Updated 2 months ago
- ☆37Jan 31, 2024Updated 2 years ago
- Visual Dialog: Light-weight Transformer for Many Inputs (ECCV 2020)☆29Aug 5, 2021Updated 4 years ago
- ☆13Jul 22, 2024Updated last year
- [Preprint] GMem: A Modular Approach for Ultra-Efficient Generative Models☆43Mar 11, 2025Updated last year
- 本项目采用Firefly模型训练框架,使用LLAMA-2模型对多项选择阅读理解任务(Multiple Choice MRC)进行微调,取得了显著的进步。☆11Sep 16, 2023Updated 2 years ago
- 河海大学每日健康打卡☆12Dec 4, 2021Updated 4 years ago
- 用Kinect2.0读取图像的深度等信息,分割出手部图像。用HOG提取手部图像信息,接着用SVM进行训练。目的是为了识别手势。☆10Jan 8, 2020Updated 6 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- NAR-BERT-ASR☆10Sep 27, 2021Updated 4 years ago
- ☆12Nov 28, 2022Updated 3 years ago
- ☆67Sep 18, 2024Updated last year
- ✨First Open-Source R1-like Video-LLM [2025/02/18]☆383Feb 23, 2025Updated last year
- chinese wwm masking and ngram masking based on jieba☆11Jul 25, 2019Updated 6 years ago
- Pre-trained Wav2vec2.0 for Mandarin☆43Oct 30, 2022Updated 3 years ago
- End-to-end Speech Translation☆35Apr 12, 2021Updated 5 years ago