thunlp/DeepPerception

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/thunlp/DeepPerception)

thunlp / DeepPerception

DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding

☆66

Alternatives and similar repositories for DeepPerception

Users that are interested in DeepPerception are comparing it to the libraries listed below

Sorting:

RUCAIBox / Virgo
View on GitHub
Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*
☆109May 27, 2025Updated 9 months ago
Sliver-g / Cardiac-CLIP
View on GitHub
☆25Jan 22, 2026Updated last month
ZhentingWang / DUMP
View on GitHub
☆33May 9, 2025Updated 10 months ago
zhishuifeiqian / VCR-Bench
View on GitHub
VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning
☆36Jul 15, 2025Updated 8 months ago
OpenMatch / TASTE
View on GitHub
[CIKM 2023 Oral] This is the code repo for our CIKM‘23 paper "Text Matching Improves Sequential Recommendation by Reducing Popularity Bia…
☆40Mar 17, 2024Updated 2 years ago
zjunlp / Deco
View on GitHub
[ICLR 2025] MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation
☆137Sep 11, 2025Updated 6 months ago
jmhb0 / microvqa
View on GitHub
[CVPR 2025] MicroVQA eval and 🤖RefineBot code for "MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research"…
☆33Nov 25, 2025Updated 3 months ago
neu-vi / struct2d
View on GitHub
Code release for 'Struct2D: A Perception-Guided Framework for Spatial Reasoning in MLLMs' (NeurIPS 2025)
☆30Oct 28, 2025Updated 4 months ago
cetmann / robustness-interpretability
View on GitHub
Code for the Paper 'On the Connection Between Adversarial Robustness and Saliency Map Interpretability' by C. Etmann, S. Lunz, P. Maass, …
☆16May 9, 2019Updated 6 years ago
jusiro / CLIP-Conformal
View on GitHub
[CVPR'25] Conformal prediction for vision-language models. Enhancing VLMs deployment with reliability gurarantees.
☆19Jun 7, 2025Updated 9 months ago
marinero4972 / CyberV
View on GitHub
☆18Jun 10, 2025Updated 9 months ago
hulianyuyy / iLLaVA
View on GitHub
iLLaVA: An Image is Worth Fewer Than 1/3 Input Tokens in Large Multimodal Models
☆21Mar 10, 2026Updated last week
yale-nlp / refdpo
View on GitHub
☆16Jul 23, 2024Updated last year
SHI-Labs / VisPer-LM
View on GitHub
[NeurIPS 2025] Elevating Visual Perception in Multimodal LLMs with Visual Embedding Distillation
☆71Oct 17, 2025Updated 5 months ago
ZhangXJ199 / TinyLLaVA-Video-R1
View on GitHub
TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning
☆115Dec 24, 2025Updated 2 months ago
ls-kelvin / REVPT
View on GitHub
Code for paper: Reinforced Vision Perception with Tools
☆72Oct 3, 2025Updated 5 months ago
rxtan2 / Koala-video-llm
View on GitHub
☆37Sep 16, 2024Updated last year
SUSTechBruce / Med-UniC
View on GitHub
official implementation of "Med-Unic: unifying cross-lingual medical vision-language pre-training by diminishing bias"
☆17Sep 22, 2023Updated 2 years ago
eric-ai-lab / MMWorld
View on GitHub
Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"
☆28Jul 15, 2025Updated 8 months ago
dongyh20 / Insight-V
View on GitHub
[CVPR2025 Highlight] Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
☆237Nov 7, 2025Updated 4 months ago
GMLR-Penn / Multiplex-Thinking
View on GitHub
Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge
☆113Jan 30, 2026Updated last month
ModalMinds / MM-EUREKA
View on GitHub
MM-EUREKA: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning
☆772Sep 7, 2025Updated 6 months ago
kodenii / ORES
View on GitHub
ORES: Open-vocabulary Responsible Visual Synthesis
☆14Dec 12, 2023Updated 2 years ago
adobe-research / llava-score
View on GitHub
☆11Oct 2, 2024Updated last year
OpenMatch / MARVEL
View on GitHub
[ACL 2024 Oral] This is the code repo for our ACL‘24 paper "MARVEL: Unlocking the Multi-Modal Capability of Dense Retrieval via Visual Mo…
☆39Jun 30, 2024Updated last year
real-absolute-AI / NoisyRollout
View on GitHub
[NeurIPS 2025] NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation
☆106Sep 18, 2025Updated 6 months ago
jihaonew / MM-Instruct
View on GitHub
MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment
☆35Jul 1, 2024Updated last year
IVY-LVLM / CODE
View on GitHub
Official Implementation of CODE
☆17Sep 26, 2024Updated last year
ZhangAIPI / YOPO_MLLM_Pruning
View on GitHub
Pruning the VLLMs
☆106Dec 9, 2024Updated last year
OpenMatch / UniVL-DR
View on GitHub
[ICLR 2023] This is the code repo for our ICLR‘23 paper "Universal Vision-Language Dense Retrieval: Learning A Unified Representation Spa…
☆53Jul 3, 2024Updated last year
TideDra / lmm-r1
View on GitHub
Extend OpenRLHF to support LMM RL training for reproduction of DeepSeek-R1 on multimodal tasks.
☆843May 14, 2025Updated 10 months ago
NY1024 / BAP-Jailbreak-Vision-Language-Models-via-Bi-Modal-Adversarial-Prompt
View on GitHub
☆60Jun 5, 2024Updated last year
HJYao00 / DenseConnector
View on GitHub
【NeurIPS 2024】Dense Connector for MLLMs
☆182Oct 14, 2024Updated last year
SUSTechBruce / LOOK-M
View on GitHub
[EMNLP 2024 Findings🔥] Official implementation of ": LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context In…
☆104Nov 9, 2024Updated last year
GAIR-NLP / MAYE
View on GitHub
Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme
☆148Apr 9, 2025Updated 11 months ago
shuo-git / InfECE
View on GitHub
☆20Dec 31, 2020Updated 5 years ago
jin-s13 / MMPD-Dataset
View on GitHub
MMPD Dataset from ECCV'2024 "When Pedestrian Detection Meets Multi-Modal Learning: Generalist Model and Benchmark Dataset"
☆21Jul 15, 2024Updated last year
cuixing100876 / InstaStyle
View on GitHub
☆15Jul 24, 2024Updated last year
realMoana / ProxyExplainer
View on GitHub
ProxyExplainer for Graph Neural Networks
☆15Oct 24, 2024Updated last year