Latest Advances on Reasoning of Multimodal Large Language Models (Multimodal R1 \ Visual R1) ) 🍓
☆36Apr 3, 2025Updated last year
Alternatives and similar repositories for Awesome-MLLM-Reasoning
Users that are interested in Awesome-MLLM-Reasoning are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- R1-Vision: Let's first take a look at the image☆48Feb 16, 2025Updated last year
- DMRM: A Dual-channel Multi-hop Reasoning Model for Visual Dialog☆25Mar 8, 2022Updated 4 years ago
- ☆22Nov 19, 2024Updated last year
- [ICASSP 2022] Improving End-to-End Contextual Speech Recognition with Fine-Grained Contextual Knowledge Selection☆25May 18, 2023Updated 2 years ago
- [ICASSP 2020] CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition (A PyTorch implementation of Continuous Integrate-and-…☆79Jan 9, 2025Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- [EMNLP 2025] Distill Visual Chart Reasoning Ability from LLMs to MLLMs☆61Aug 25, 2025Updated 8 months ago
- ☆19Sep 19, 2024Updated last year
- The official repo for [ACM CSUR'24] "Empowering Agrifood System with Artificial Intelligence: A Survey of the Progress, Challenges and Op…☆12Dec 6, 2024Updated last year
- ☆37Jun 28, 2021Updated 4 years ago
- a file-based long-term memory agent skill☆25Dec 28, 2025Updated 4 months ago
- Latest Advances on System-2 Reasoning☆1,351Jun 8, 2025Updated 10 months ago
- [CVPR 2026] MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources☆218Sep 26, 2025Updated 7 months ago
- The project for speech translation☆12Sep 28, 2023Updated 2 years ago
- Recent Advances in Visual Dialog☆28Aug 19, 2022Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆11Oct 20, 2022Updated 3 years ago
- Implementation of CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning☆48Nov 8, 2023Updated 2 years ago
- A fork to add multimodal model training to open-r1☆1,532Feb 8, 2025Updated last year
- Extend OpenRLHF to support LMM RL training for reproduction of DeepSeek-R1 on multimodal tasks.☆845May 14, 2025Updated 11 months ago
- [NeurIPS 2024] Efficiency for Free: Ideal Data Are Transportable Representations☆19Jan 19, 2025Updated last year
- [CVPR 2026] Variation-aware Vision Token Dropping for Faster Large Vision-Language Models☆31Mar 18, 2026Updated last month
- Awesome Entity Alignment is a collection of EA techniques, including papers, codes, and datasets.☆11Oct 27, 2022Updated 3 years ago
- ☆26Dec 8, 2022Updated 3 years ago
- [INTERSPEECH 2023] Knowledge Transfer from Pre-trained Language Models to Cif-based Recognizers via Hierarchical Distillation☆41Sep 1, 2023Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Combined InstantID🔥 and FouriScale to generate high resolution image!☆11Apr 3, 2024Updated 2 years ago
- ☆12Jul 7, 2022Updated 3 years ago
- 电子病历结构化解析☆13May 11, 2022Updated 3 years ago
- ☆13Sep 25, 2024Updated last year
- Implementation of Unified Embedding: Battle-Tested Feature Representations for Web-Scale ML Systems☆15Nov 11, 2023Updated 2 years ago
- [CVPR2026] BinaryAttention: One-Bit QK-Attention for Vision and Diffusion Transformers☆32Mar 17, 2026Updated last month
- ☆37Jan 31, 2024Updated 2 years ago
- This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-bas…☆1,406Apr 19, 2026Updated 2 weeks ago
- ☆13Jul 22, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Visual Dialog: Light-weight Transformer for Many Inputs (ECCV 2020)☆29Aug 5, 2021Updated 4 years ago
- Codes for DATA: Differentiable ArchiTecture Approximation.☆11Jul 22, 2021Updated 4 years ago
- 本项目采用Firefly模型训练框架,使用LLAMA-2模型对多项选择阅读理解任务(Multiple Choice MRC)进行微调,取得了显著的进步。☆11Sep 16, 2023Updated 2 years ago
- 河海大学每日健康打卡☆12Dec 4, 2021Updated 4 years ago
- 用Kinect2.0读取图像的深度等信息,分割出手部图像。用HOG提取手部图像信息,接着用SVM进行训练。目的是为了识别手势。☆10Jan 8, 2020Updated 6 years ago
- NAR-BERT-ASR☆10Sep 27, 2021Updated 4 years ago
- Offical respority for Gait Recogniton with Drones: A benchmark (TMM 2023)☆10Feb 2, 2024Updated 2 years ago