zhangguanghao523 / CMMCoTLinks
Official implementation of CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation
β11Updated 4 months ago
Alternatives and similar repositories for CMMCoT
Users that are interested in CMMCoT are comparing it to the libraries listed below
Sorting:
- [CVPR 2025 π₯]A Large Multimodal Model for Pixel-Level Visual Grounding in Videosβ83Updated 5 months ago
- Think or Not Think: A Study of Explicit Thinking in Rule-Based Visual Reinforcement Fine-Tuningβ63Updated 4 months ago
- [NeurIPS 2024] Visual Perception by Large Language Modelβs Weightsβ45Updated 5 months ago
- [CVPR 2025] RAP: Retrieval-Augmented Personalizationβ69Updated last month
- Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data Efficiencyβ52Updated 3 months ago
- β81Updated 10 months ago
- [CVPR 2025 Oral] VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selectionβ117Updated last month
- TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generationβ215Updated last month
- [NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'β192Updated 2 months ago
- This repository is the official implementation of "Look-Back: Implicit Visual Re-focusing in MLLM Reasoning".β56Updated 2 months ago
- PyTorch code for "Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training"β36Updated last year
- [ICCV 2025 Oral] Official implementation of Learning Streaming Video Representation via Multitask Training.β54Updated this week
- [CVPR 2025] VASparse: Towards Efficient Visual Hallucination Mitigation via Visual-Aware Token Sparsificationβ35Updated 5 months ago
- Official code for NeurIPS 2025 paper "GRIT: Teaching MLLMs to Think with Images"β137Updated last month
- MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Modelsβ41Updated 5 months ago
- β122Updated 6 months ago
- [NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videosβ135Updated 8 months ago