Collect the awesome works evolved around reasoning models like O1/R1 in visual domain
☆54Jul 21, 2025Updated 9 months ago
Alternatives and similar repositories for awesome-deep-multimodal-reasoning
Users that are interested in awesome-deep-multimodal-reasoning are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official implementation of Our NeurIPS 2024 Paper "Boundary Matters: A Bi-Level Active Finetuning Method"☆14Feb 11, 2025Updated last year
- Multi-step reasoning MLLM☆21Mar 8, 2026Updated last month
- Checkpoints, logs and source code for AAAI-23 paper 'Data-Efficient Image Quality Assessment with Attention-Panel Decoder'☆39Apr 3, 2024Updated 2 years ago
- [Blog 1] Recording a bug of grpo_trainer in some R1 projects☆23Feb 23, 2025Updated last year
- ☆14Jul 14, 2025Updated 9 months ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- [ICML 2024] PyTorch implementation for "Diversified Batch Selection for Training Acceleration"☆10Jul 30, 2024Updated last year
- [EMNLP25 Main]The official code of "Gradient-Attention Guided Dual-Masking Synergetic Framework for Robust Text-based Person Retrieval"☆26Mar 30, 2026Updated last month
- ☆12Sep 11, 2020Updated 5 years ago
- Cross-modal Active Complementary Learning with Self-refining Correspondence (NeurIPS 2023, Pytorch Code)☆15Jun 6, 2024Updated last year
- The code of MGCC: Text-based Occluded Person Re-identification via Multi-Granularity Contrastive Consistency Learning☆20Feb 26, 2025Updated last year
- [CVPR 2024 Highlight] Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering☆10Jul 29, 2024Updated last year
- Agent-RRM: Exploring Reasoning Reward Model for Agents☆63Mar 17, 2026Updated last month
- ☆11Jun 28, 2020Updated 5 years ago
- ADUULM-360 dataset access, tools, and baseline models☆10Sep 11, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆44Sep 15, 2025Updated 7 months ago
- Minimal PyTorch implementation of TP, SP, FSDP and sharded-EMA☆32Nov 27, 2025Updated 5 months ago
- code for Eliminating Cross-modal Conflicts in BEV Space for LiDAR-Camera 3D Object Detection☆19Mar 4, 2024Updated 2 years ago
- ROSE: Robust Cross Supervision with Neighborhood Mining for Source-free Graph Domain Adaptation☆20Oct 22, 2024Updated last year
- [ICLR 2026] Official PyTorch implementation for "ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding"☆63Dec 26, 2025Updated 4 months ago
- ☆17Jun 10, 2025Updated 10 months ago
- Advances in recent large vision language models (LVLMs)☆15Sep 23, 2024Updated last year
- 自己阅读的多模态对话系统论文(及部分笔记)汇总☆22Jan 5, 2023Updated 3 years ago
- ☆12Feb 2, 2026Updated 2 months ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- ☆14May 25, 2024Updated last year
- [ICLR 2026]🚀ReVisual-R1 is a 7B open-source multimodal language model that follows a three-stage curriculum—cold-start pre-training, mul…☆211Dec 10, 2025Updated 4 months ago
- Unofficial implementation of PointNet and PointNet++☆10Oct 26, 2023Updated 2 years ago
- [NeurIPS 2024 Datasets and Benchmarks Track] Benchmarking PtO and PnO Methods in the Predictive Combinatorial Optimization Regime☆24Mar 27, 2025Updated last year
- CVPR 2022 paper☆16Jun 9, 2022Updated 3 years ago
- Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models☆120Jul 7, 2025Updated 9 months ago
- SODA-D Small Object Detection Toolbox and Benchmark☆43Feb 27, 2025Updated last year
- Codebase for VidHal: Benchmarking Hallucinations in Vision LLMs☆14Apr 23, 2026Updated last week
- Papers of "A Survey on Multimodal LLMs from the Perspective of Input-Output Space Extension"☆17Feb 4, 2026Updated 2 months ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Code for "Speaker Clustering using Dominant Sets", ICPR 2018☆11Nov 28, 2020Updated 5 years ago
- Human Pose Classification☆16Feb 19, 2023Updated 3 years ago
- [Pattern Recognition, 2020] Covariance Descriptors on a Gaussian Manifold and their Application to Image Set Classification☆12May 28, 2022Updated 3 years ago
- (ACL 2025) Divide-Then-Aggregate: An Efficient Tool Learning Method via Parallel Tool Invocation☆12May 21, 2025Updated 11 months ago
- TransRefer3D: Entity-and-Relation Aware Transformer for Fine-Grained 3D Visual Grounding [ACM MM'21]☆20Apr 23, 2022Updated 4 years ago
- Res2Net for Pose Estimation using Simple Baselines as the baseline☆36Oct 12, 2021Updated 4 years ago
- ☆16Nov 14, 2022Updated 3 years ago