HC-Guo/Awesome-Multimodal-Chain-of-Thought

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/HC-Guo/Awesome-Multimodal-Chain-of-Thought)

HC-Guo / Awesome-Multimodal-Chain-of-Thought

Collection of papers and repos for multimodal chain-of-thought

☆89

Alternatives and similar repositories for Awesome-Multimodal-Chain-of-Thought

Users that are interested in Awesome-Multimodal-Chain-of-Thought are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Video-R1 / Awesome-Multimodal-Reasoning
View on GitHub
Collections of Papers and Projects for Multimodal Reasoning.
☆108Apr 25, 2025Updated last year
MCLAB-OCR / KnowledgeMiningWithSceneText
View on GitHub
☆38Feb 4, 2023Updated 3 years ago
Tanveer81 / RGNet
View on GitHub
This is the official implementation of RGNet: A Unified Retrieval and Grounding Network for Long Videos
☆20Mar 3, 2025Updated last year
icq-benchmark / icq-benchmark
View on GitHub
☆19Jul 28, 2025Updated 11 months ago
gimpong / AAAI25-S5VH
View on GitHub
The code for the paper "Efficient Self-Supervised Video Hashing with Selective State Spaces" (AAAI'25).
☆24Aug 2, 2025Updated 11 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
hrtang22 / MUSE
View on GitHub
Code implementation of paper "MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval (AAAI2025)"
☆26Feb 2, 2025Updated last year
huangmozhi9527 / GMMFormer
View on GitHub
[AAAI 2024] GMMFormer: Gaussian-Mixture-Model Based Transformer for Efficient Partially Relevant Video Retrieval
☆21May 10, 2024Updated 2 years ago
cs-holder / Reasoning-Self-Evolution-Survey
View on GitHub
☆54Mar 6, 2025Updated last year
gimpong / WWW22-HCQ
View on GitHub
The code for the paper "Hybrid Contrastive Quantization for Efficient Cross-View Video Retrieval" (WWW'22, Oral).
☆17Mar 8, 2022Updated 4 years ago
gimpong / CVPR25-Condenser
View on GitHub
The code for the paper "Embracing Collaboration Over Competition: Condensing Multiple Prompts for Visual In-Context Learning" (CVPR'25).
☆16Sep 25, 2025Updated 10 months ago
lijun2005 / Awesome-Partially-Relevant-Video-Retrieval
View on GitHub
A paper list of partially relevant video retrieval
☆42Updated this week
CMMMU-Benchmark / CMMMU
View on GitHub
☆48Sep 5, 2024Updated last year
Kelly510 / RehabExerAssess
View on GitHub
[TNSRE 2023] The official PyTorch code for A Skeleton-Based Rehabilitation Exercise Assessment System With Rotation Invariance
☆11Jul 27, 2025Updated 11 months ago
yaotingwangofficial / Awesome-MCoT
View on GitHub
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
☆1,017May 22, 2026Updated 2 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Sun-Haoyuan23 / Awesome-RL-based-Reasoning-MLLMs
View on GitHub
This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-bas…
☆1,435May 11, 2026Updated 2 months ago
zenghy96 / Reliable-Source-Approximation
View on GitHub
Reliable Source Approximation: Source-Free Domain Adaptation for Vestibular Schwannoma MRI Segmentation
☆11Dec 28, 2024Updated last year
dongyh20 / Insight-V
View on GitHub
[CVPR2025 Highlight] Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
☆240Nov 7, 2025Updated 8 months ago
THUNLP-MT / Brote
View on GitHub
☆11Jan 19, 2025Updated last year
NExTplusplus / TAT-DQA
View on GitHub
TAT-DQA: Towards Complex Document Understanding By Discrete Reasoning
☆24Sep 17, 2024Updated last year
ChocoWu / SeTok
View on GitHub
Codes for ICLR 2025 Paper: Towards Semantic Equivalence of Tokenization in Multimodal LLM
☆81Apr 19, 2025Updated last year
LightChen233 / M3CoT
View on GitHub
☆92Mar 12, 2026Updated 4 months ago
wk-ff / GTC
View on GitHub
reimplement of "GTC: Guided Training of CTC Towards Efficient and Accurate Scene Text Recognition"
☆15Nov 10, 2020Updated 5 years ago
T-SciQ / T-SciQ
View on GitHub
☆22Jun 13, 2024Updated 2 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
tmlr-group / SMM
View on GitHub
[ICML 2024 Spotlight] "Sample-specific Masks for Visual Reprogramming-based Prompting"
☆12Dec 20, 2024Updated last year
zchoi / VCRN
View on GitHub
☆11Jul 11, 2023Updated 3 years ago
csguoh / KD-LTR
View on GitHub
[MM2023] An official implement of the paper "One-stage Low-resolution Text Recognition with High-resolution Knowledge Transfer"
☆16Nov 3, 2023Updated 2 years ago
kdu4108 / semiring-backprop-exps
View on GitHub
☆16Jul 10, 2023Updated 3 years ago
gogoczh / CoMT
View on GitHub
code for "CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models"
☆19Mar 10, 2025Updated last year
Woody5962 / Ranked-List-Truncation
View on GitHub
A framework for Ranked List Truncation, including the implementation of multiple existing deep models, such as BiCut、Choopy and AttnCut. …
☆14May 7, 2022Updated 4 years ago
dragen1860 / awesome-causal-reasoning
View on GitHub
☆16Nov 14, 2018Updated 7 years ago
ThunderVVV / RCLSTR
View on GitHub
Official PyTorch implementation of `[ACMMM 2023]Relational Contrastive Learning for Scene Text Recognition`
☆17Sep 22, 2023Updated 2 years ago
sail-sg / MMCBench
View on GitHub
☆27Jan 23, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
showlab / DiffSim
View on GitHub
[ICCV 2025] Official repository of DiffSim: Taming Diffusion Models for Evaluating Visual Similarity
☆31Jul 14, 2025Updated last year
chancharikmitra / CCoT
View on GitHub
[CVPR 2024] Official Code for the Paper "Compositional Chain-of-Thought Prompting for Large Multimodal Models"
☆142Jun 20, 2024Updated 2 years ago
gouba2333 / MA-HMR
View on GitHub
☆17Nov 20, 2025Updated 8 months ago
gimpong / AAAI22-MeCoQ
View on GitHub
The code for the paper "Contrastive Quantization with Code Memory for Unsupervised Image Retrieval" (AAAI'22, Oral).
☆37Oct 21, 2022Updated 3 years ago
Junchao-cs / LIVE
View on GitHub
[ICML 2026] "LIVE: Long-horizon Interactive Video World ModEling"
☆35Jul 15, 2026Updated last week
GeoEval / GeoEval
View on GitHub
This is the Repository for Geometry Problem Solving Method Evaluation
☆27Oct 8, 2024Updated last year
SpeechEE / SpeechEE
View on GitHub
☆11Aug 20, 2025Updated 11 months ago