DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding
☆66Jun 10, 2025Updated 9 months ago
Alternatives and similar repositories for DeepPerception
Users that are interested in DeepPerception are comparing it to the libraries listed below
Sorting:
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆109May 27, 2025Updated 9 months ago
- ☆25Jan 22, 2026Updated last month
- ☆33May 9, 2025Updated 10 months ago
- VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning☆36Jul 15, 2025Updated 8 months ago
- [CIKM 2023 Oral] This is the code repo for our CIKM‘23 paper "Text Matching Improves Sequential Recommendation by Reducing Popularity Bia…☆40Mar 17, 2024Updated 2 years ago
- [ICLR 2025] MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation☆137Sep 11, 2025Updated 6 months ago
- [CVPR 2025] MicroVQA eval and 🤖RefineBot code for "MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research"…☆33Nov 25, 2025Updated 3 months ago
- Code release for 'Struct2D: A Perception-Guided Framework for Spatial Reasoning in MLLMs' (NeurIPS 2025)☆30Oct 28, 2025Updated 4 months ago
- Code for the Paper 'On the Connection Between Adversarial Robustness and Saliency Map Interpretability' by C. Etmann, S. Lunz, P. Maass, …☆16May 9, 2019Updated 6 years ago
- [CVPR'25] Conformal prediction for vision-language models. Enhancing VLMs deployment with reliability gurarantees.☆19Jun 7, 2025Updated 9 months ago
- ☆18Jun 10, 2025Updated 9 months ago
- iLLaVA: An Image is Worth Fewer Than 1/3 Input Tokens in Large Multimodal Models☆21Mar 10, 2026Updated last week
- ☆16Jul 23, 2024Updated last year
- [NeurIPS 2025] Elevating Visual Perception in Multimodal LLMs with Visual Embedding Distillation☆71Oct 17, 2025Updated 5 months ago
- TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning☆115Dec 24, 2025Updated 2 months ago
- Code for paper: Reinforced Vision Perception with Tools☆72Oct 3, 2025Updated 5 months ago
- ☆37Sep 16, 2024Updated last year
- official implementation of "Med-Unic: unifying cross-lingual medical vision-language pre-training by diminishing bias"☆17Sep 22, 2023Updated 2 years ago
- Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"☆28Jul 15, 2025Updated 8 months ago
- [CVPR2025 Highlight] Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models☆237Nov 7, 2025Updated 4 months ago
- Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge☆113Jan 30, 2026Updated last month
- MM-EUREKA: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning☆772Sep 7, 2025Updated 6 months ago
- ORES: Open-vocabulary Responsible Visual Synthesis☆14Dec 12, 2023Updated 2 years ago
- ☆11Oct 2, 2024Updated last year
- [ACL 2024 Oral] This is the code repo for our ACL‘24 paper "MARVEL: Unlocking the Multi-Modal Capability of Dense Retrieval via Visual Mo…☆39Jun 30, 2024Updated last year
- [NeurIPS 2025] NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation☆106Sep 18, 2025Updated 6 months ago
- MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment☆35Jul 1, 2024Updated last year
- Official Implementation of CODE☆17Sep 26, 2024Updated last year
- Pruning the VLLMs☆106Dec 9, 2024Updated last year
- [ICLR 2023] This is the code repo for our ICLR‘23 paper "Universal Vision-Language Dense Retrieval: Learning A Unified Representation Spa…☆53Jul 3, 2024Updated last year
- Extend OpenRLHF to support LMM RL training for reproduction of DeepSeek-R1 on multimodal tasks.☆843May 14, 2025Updated 10 months ago
- ☆60Jun 5, 2024Updated last year
- 【NeurIPS 2024】Dense Connector for MLLMs☆182Oct 14, 2024Updated last year
- [EMNLP 2024 Findings🔥] Official implementation of ": LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context In…☆104Nov 9, 2024Updated last year
- Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme☆148Apr 9, 2025Updated 11 months ago
- ☆20Dec 31, 2020Updated 5 years ago
- MMPD Dataset from ECCV'2024 "When Pedestrian Detection Meets Multi-Modal Learning: Generalist Model and Benchmark Dataset"☆21Jul 15, 2024Updated last year
- ☆15Jul 24, 2024Updated last year
- ProxyExplainer for Graph Neural Networks☆15Oct 24, 2024Updated last year