zhangguanghao523/CMMCoT

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/zhangguanghao523/CMMCoT)

zhangguanghao523 / CMMCoT

[AAAI'26] Official implementation of CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation

☆11

Alternatives and similar repositories for CMMCoT

Users that are interested in CMMCoT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

aburns4 / textualforesight
View on GitHub
☆12Aug 8, 2024Updated last year
dk-liang / AutoScale_regression
View on GitHub
An implementation of AutoScale regression-based method
☆12Oct 27, 2020Updated 5 years ago
godspeedcurry / godscan
View on GitHub
☆26Jan 13, 2026Updated 6 months ago
THUNLP-MT / ActiView
View on GitHub
☆11Dec 20, 2024Updated last year
SpeechEE / SpeechEE
View on GitHub
☆11Aug 20, 2025Updated 11 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
deepglint / RealSyn
View on GitHub
[ACM MM2025] The official repository for the RealSyn dataset
☆39Dec 14, 2025Updated 7 months ago
haoyi-duan / DG-SCT
View on GitHub
NeurIPS'2023 official implementation code
☆70Nov 11, 2023Updated 2 years ago
RupertLuo / VoCoT
View on GitHub
VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
☆79Jul 13, 2024Updated 2 years ago
BAAI-Humanoid / RobotBridge
View on GitHub
Unified Sim2Sim and Sim2Real Deployment Framework for Humanoid Robots - Plug and Play
☆24Mar 7, 2026Updated 4 months ago
mengzaiqiao / awesome-natural-language-reasoning
View on GitHub
A collection of research papers related to Natural Language Reasoning
☆10May 27, 2022Updated 4 years ago
MME-Benchmarks / MME-Unify
View on GitHub
✨✨ [ICLR 2026] MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models
☆42Apr 10, 2025Updated last year
jasongief / CPSP
View on GitHub
[2022 TPAMI] Contrastive Positive Sample Propagation along the Audio-Visual Event Line
☆32Mar 6, 2023Updated 3 years ago
jylei16 / Imagine-e
View on GitHub
☆14Jan 22, 2025Updated last year
fperazzi / davis
View on GitHub
☆10Aug 1, 2021Updated 4 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
xtong-zhang / Chain-of-Focus
View on GitHub
☆70Dec 5, 2025Updated 7 months ago
yuanc3 / DATE
View on GitHub
Use 2 lines to empower absolute time awareness for Qwen2.5VL's MRoPE
☆29Sep 20, 2025Updated 10 months ago
hrh6666 / Flexible-Locomotion-Learning-with-Diffusion-Model-Predictive-Control
View on GitHub
Official Implementation of Flexible Locomotion Learning with Diffusion Model Predictive Control
☆28Apr 18, 2026Updated 3 months ago
ShawnChenn / FlexibleReflectionRemoval
View on GitHub
AAAI 25' Flexible Image Reflection Removal with Sparse Human Guidance
☆12Jul 7, 2025Updated last year
Michaelszj / bags
View on GitHub
☆11Dec 11, 2024Updated last year
dongyh20 / Insight-V
View on GitHub
[CVPR2025 Highlight] Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
☆240Nov 7, 2025Updated 8 months ago
GasolSun36 / SURf
View on GitHub
[EMNLP 2024] SURf: Teaching Large Vision-Language Models to Selectively Utilize Retrieved Information
☆11Oct 11, 2024Updated last year
WeijieMax / CPC-Trans
View on GitHub
[MICCAI 2022] Toward Clinically Assisted Colorectal Polyp Recognition via Structured Cross-modal Representation Consistency
☆14Nov 8, 2024Updated last year
YuanLi95 / KECPM
View on GitHub
Tis is code for Few-Shot Joint Multimodal Entity-Relation Extraction via Knowledge-Enhanced Cross-modal Prompt Model (ACM MM 2024))
☆12Aug 27, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
TobyYang7 / Llava_Qwen2
View on GitHub
Visual Instruction Tuning for Qwen2 Base Model
☆43Jun 29, 2024Updated 2 years ago
marmot-xy / CMBS
View on GitHub
cross modal background suppression for audio-visual event localization
☆36Mar 18, 2022Updated 4 years ago
FSoft-AI4Code / VisualCoder
View on GitHub
[NAACL 2025] Guiding Large Language Models in Code Execution with Fine-grained Multimodal Chain-of-Thought Reasoning
☆10Feb 9, 2025Updated last year
THUKElab / MESED
View on GitHub
[AAAI 2024] MESED: A Multi-modal Entity Set Expansion Dataset with Fine-grained Semantic Classes and Hard Negative Entities
☆15Apr 26, 2024Updated 2 years ago
ChenyuHeidiZhang / VL-commonsense
View on GitHub
☆14May 23, 2022Updated 4 years ago
boreng0817 / IFCap
View on GitHub
[EMNLP 2024] IFCap: Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot Captioning
☆15May 13, 2025Updated last year
zhiweihu1103 / ET-TET
View on GitHub
[EMNLP2022] Transformer-based Entity Typing in Knowledge Graphs
☆15Nov 26, 2024Updated last year
Aurora-slz / MM-Verify
View on GitHub
☆19Oct 28, 2025Updated 8 months ago
HeimingX / TAG
View on GitHub
Official code for Attention-driven GUI Grounding, AAAI2025
☆15Dec 17, 2024Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
Yifannnnnnnnw / ai-dispatch
View on GitHub
Your daily AI intelligence dispatch to Email 📧· Robotics, Agents & LLMs analyzed by Claude Opus · 每日多源聚合 + 深度分析，GitHub Actions 一键部署，无需服务…
☆21Updated this week
mybearyZhang / TwoStageReason
View on GitHub
Official implementation of ECCV 2024 paper: Take A Step Back: Rethinking the Two Stages in Visual Reasoning
☆13Jun 1, 2025Updated last year
MCG-NJU / VideoChat-Online
View on GitHub
[CVPR 2025] Online Video Understanding: OVBench and VideoChat-Online
☆97Oct 7, 2025Updated 9 months ago
VisuLogic-Benchmark / VisuLogic-Train
View on GitHub
☆21Jul 9, 2025Updated last year
SPIRAL-MED / CP_ENV
View on GitHub
☆15Dec 15, 2025Updated 7 months ago
dk-liang / Awesome-GPT4-with-Applications
View on GitHub
Awesome GPT-4 with Applications. This is a collection of resources related to GPT-4, including news, official documents, demo and applica…
☆20Mar 15, 2023Updated 3 years ago
ninibymilk / PMF-MMEA
View on GitHub
[ACL2024] Progressively Modality Freezing for Multi-Modal Entity Alignment
☆19Apr 10, 2025Updated last year