Yu-xm/Modality_Gap_Theory

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Yu-xm/Modality_Gap_Theory)

Yu-xm / Modality_Gap_Theory

Modality Gap Theory

☆76

Alternatives and similar repositories for Modality_Gap_Theory

Users that are interested in Modality_Gap_Theory are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

xlyu0106 / MACT
View on GitHub
☆19Jul 31, 2025Updated 11 months ago
LsmnBmnc / Med-CMR
View on GitHub
Official code repository for Med-CMR : "A Fine-Grained Benchmark Integrating Visual Evidence and Clinical Logic for Medical Complex Multi…
☆26Dec 10, 2025Updated 7 months ago
lyrig / TokenAR
View on GitHub
TokenAR: Multiple Subject Generation via Autoregressive Token-level enhancement
☆22Mar 4, 2026Updated 4 months ago
Chenliu-svg / DiPro
View on GitHub
This is the official repository for DiPro, highlighted as a Spotlight at NeurIPS 2025.
☆16Nov 7, 2025Updated 8 months ago
latentcraft / replay
View on GitHub
[CVPR 2026] Boosting Reasoning in Large Multimodal Models via Activation Replay
☆24May 7, 2026Updated 2 months ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
diaoquesang / GL-LCM
View on GitHub
[MICCAI 2025] GL-LCM: Global-Local Latent Consistency Models for Fast High-Resolution Bone Suppression in Chest X-Ray Images
☆17Mar 12, 2026Updated 4 months ago
MME-Benchmarks / MME-Unify
View on GitHub
✨✨ [ICLR 2026] MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models
☆42Apr 10, 2025Updated last year
LunarShen / DsicoVLA
View on GitHub
[CVPR 2025] DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text Retrieval
☆22Jun 23, 2025Updated last year
zhangzjn / T3-Video
View on GitHub
[ICML 2026] Transform Trained Transformer for Accelerating Native 4K Video Generation
☆41Dec 16, 2025Updated 7 months ago
Franklin-Zhang0 / ReasonGen-R1
View on GitHub
Official respository for ReasonGen-R1
☆75Jun 23, 2025Updated last year
AIM-SKKU / ADAPT
View on GitHub
[NeurIPS 2025] Backpropagation-Free Test-Time Adaptation via Probabilistic Gaussian Alignment
☆22Mar 18, 2026Updated 4 months ago
wulin97 / Aff3DFunc
View on GitHub
ACM MM'2025 Open-Vocabulary 3D Affordance Understanding via Functional Text Enhancement and Multilevel Representation Alignment
☆18Nov 4, 2025Updated 8 months ago
NUS-Project / Landmark-of-medical-agent
View on GitHub
☆181Jun 8, 2026Updated last month
AliBahri94 / SVWA_TTA
View on GitHub
[WACV 2025-Oral Presentation] Test-Time Adaptation in Point Clouds: Leveraging Sampling Variation with Weight Averaging
☆13Mar 31, 2025Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
xlyu0106 / VisMem
View on GitHub
☆91Feb 5, 2026Updated 5 months ago
xlyu0106 / ViF
View on GitHub
[ICLR 26] Visual Multi-Agent System: Mitigating Hallucination Snowballing via Visual Flow
☆44Oct 3, 2025Updated 9 months ago
shengliu66 / FractionalReason
View on GitHub
Official github repo for "Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute"
☆17Jun 30, 2025Updated last year
Purshow / Awesome-Unified-Multimodal
View on GitHub
📖 This is a repository for organizing papers, codes, and other resources related to unified multimodal models.
☆364Jan 8, 2026Updated 6 months ago
multimodal-reasoning-lab / Bagel-Zebra-CoT
View on GitHub
https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT
☆137Jan 30, 2026Updated 5 months ago
multimodal-art-projection / CodeCriticBench
View on GitHub
☆16Nov 1, 2025Updated 8 months ago
foreverlasting1202 / QuestA
View on GitHub
☆22Jan 2, 2026Updated 6 months ago
MCG-NJU / RGE
View on GitHub
Reasoning Guided Embeddings: Leveraging MLLM Reasoning for Improved Multimodal Retrieval
☆15Nov 29, 2025Updated 7 months ago
Chenfei-Liao / Multi-Modal-Semantic-Segmentation-Robustness-Benchmark
View on GitHub
[CVPR Workshop Best Paper Award] Benchmarking Multi-modal Semantic Segmentation under Sensor Failures: Missing and Noisy Modality Robustn…
☆19Nov 4, 2025Updated 8 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
bdusell / stack-attention
View on GitHub
Code for the paper "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"
☆18Mar 15, 2024Updated 2 years ago
OpenGVLab / De-focus-Attention-Networks
View on GitHub
Learning 1D Causal Visual Representation with De-focus Attention Networks
☆35Jun 7, 2024Updated 2 years ago
Yuan-Hou / Human-MME
View on GitHub
Official repository for "Human-MME: A Holistic Evaluation Benchmark for Human-Centric Multimodal Large Language Models"
☆22Dec 2, 2025Updated 7 months ago
wusize / Harmon
View on GitHub
[ICCV2025]Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
☆191May 21, 2025Updated last year
clovaai / ECLIPSE
View on GitHub
(CVPR 2024) ECLIPSE: Efficient Continual Learning in Panoptic Segmentation with Visual Prompt Tuning
☆51Dec 17, 2024Updated last year
zlab-princeton / UEval
View on GitHub
UEval: A Benchmark for Unified Multimodal Generation
☆24Apr 20, 2026Updated 3 months ago
thunlp / KARL
View on GitHub
KARL: Knowledge-Aware Reasoning and Reinforcement Learning for Knowledge-Intensive Visual Grounding
☆68Apr 5, 2026Updated 3 months ago
OpenGVLab / Mono-InternVL
View on GitHub
[CVPR 2025] Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
☆109Jul 18, 2025Updated last year
Sunmmyy / OTPR
View on GitHub
Score-Based Diffusion Policy Compatible with Reinforcement Learning via Optimal Transport
☆15Feb 26, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
XMUDeepLIT / UME-R1
View on GitHub
The code implementation for UME-R1: Exploring Reasoning-Driven Generative Multimodal Embeddings (ICLR 2026).
☆69Feb 25, 2026Updated 4 months ago
yliu-cs / PiTe
View on GitHub
[ECCV'24 Oral] PiTe: Pixel-Temporal Alignment for Large Video-Language Model
☆17Feb 13, 2025Updated last year
visual-haystacks / mirage
View on GitHub
🔥 [ICLR 2025] Official PyTorch Model "Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark"
☆27Feb 9, 2025Updated last year
HaroldChen19 / VistaDPO
View on GitHub
[ICML 2025] VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models
☆41Jun 14, 2025Updated last year
yliu-cs / SSR
View on GitHub
[NeurIPS'25] SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning
☆40Oct 14, 2025Updated 9 months ago
jihaonew / MM-Instruct
View on GitHub
MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment
☆35Jul 1, 2024Updated 2 years ago
clemneo / llava-interp
View on GitHub
☆86Nov 5, 2024Updated last year