chengtan9907 / mc-cotLinks

The official implementation of the ECCV'24 paper MC-CoT: Boosting the Power of Small Multimodal Reasoning Models to Match Larger Models with Self-Consistency Training.

☆24

Alternatives and similar repositories for mc-cot

Users that are interested in mc-cot are comparing it to the libraries listed below

Sorting:

OpenSparseLLMs / CLIP-MoE
CLIP-MoE: Mixture of Experts for CLIP
☆42Updated 9 months ago
GasolSun36 / MVP
Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning
☆22Updated 10 months ago
yuecao0119 / MMFuser
The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". …
☆56Updated 8 months ago
JieShibo / MemVP
[ICML 2024] Memory-Space Visual Prompting for Efficient Vision-Language Fine-Tuning
☆49Updated last year
csarron / PuMer
[ACL 2023] PuMer: Pruning and Merging Tokens for Efficient Vision Language Models
☆32Updated 9 months ago
yfzhang114 / LLaVA-Align
This is the official repo for Debiasing Large Visual Language Models, including a Post-Hoc debias method and Visual Debias Decoding strat…
☆78Updated 4 months ago
scale-lab / MTLoRA
The official implementation for MTLoRA: A Low-Rank Adaptation Approach for Efficient Multi-Task Learning (CVPR '24)
☆55Updated 2 weeks ago
CRIPAC-DIG / LogicCheckGPT
[ACL 2024] Logical Closed Loop: Uncovering Object Hallucinations in Large Vision-Language Models. Detect and mitigate object hallucinatio…
☆22Updated 5 months ago
core-mm / core-mm
☆17Updated last year
mlvlab / TokenMixup
Official pytorch implementation of NeurIPS 2022 paper, TokenMixup
☆48Updated 2 years ago
findalexli / mllm-dpo
[ACL 2024] Multi-modal preference alignment remedies regression of visual instruction tuning on language model
☆46Updated 8 months ago
gyhdog99 / MoCLE
MoCLE (First MLLM with MoE for instruction customization and generalization!) (https://arxiv.org/abs/2312.12379)
☆41Updated 2 weeks ago
WillDreamer / Aurora
[NeurIPS2023] Parameter-efficient Tuning of Large-scale Multimodal Foundation Model
☆88Updated last year
RayRuiboChen / Self-Filter
☆24Updated last week
mlvlab / DAPT
Distribution-Aware Prompt Tuning for Vision-Language Models (ICCV 2023)
☆41Updated last year
ElvishElvis / LCA-on-the-line
LCA-on-the-line (ICML 2024 Oral)
☆12Updated 5 months ago
zycheiheihei / Transferable-Visual-Prompting
[CVPR2024 Highlight] Official implementation for Transferable Visual Prompting. The paper "Exploring the Transferability of Visual Prompt…
☆44Updated 7 months ago
lixinustc / GraphAdapter
The efficient tuning method for VLMs
☆80Updated last year
SooLab / DDCOT
[NeurIPS 2023]DDCoT: Duty-Distinct Chain-of-Thought Prompting for Multimodal Reasoning in Language Models
☆44Updated last year
AtsuMiyai / GL-MCM
[IJCV2025] https://arxiv.org/abs/2304.04521
☆15Updated 5 months ago
Kevinz-code / SeVa
[MM2024, oral] "Self-Supervised Visual Preference Alignment" https://arxiv.org/abs/2404.10501
☆56Updated 11 months ago
Yuqifan1117 / HalluciDoctor
HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data (Accepted by CVPR 2024)
☆45Updated last year
JiwanChung / vlis
☆24Updated last year
ZhengYu518 / VL-Mamba
Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"
☆81Updated last year
yiren-jian / BLIText
[NeurIPS 2023] Bootstrapping Vision-Language Learning with Decoupled Language Pre-training
☆25Updated last year
zhiheLu / Ensemble_VLM
Official code for paper "Beyond Sole Strength: Customized Ensembles for Generalized Vision-Language Models, ICML2024"
☆24Updated 5 months ago
yaolinli / DeCo
Code for DeCo: Decoupling token compression from semanchc abstraction in multimodal large language models
☆53Updated last week
Lackel / AGLA
[CVPR 2025] Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention
☆38Updated last year
microsoft / A-CLIP
Official Implementation of Attentive Mask CLIP (ICCV2023, https://arxiv.org/abs/2212.08653)
☆32Updated last year
Kwai-YuanQi / TaskGalaxy
Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types
☆24Updated this week