MeganTj / multimodal_alignmentLinks
☆14Updated 2 months ago
Alternatives and similar repositories for multimodal_alignment
Users that are interested in multimodal_alignment are comparing it to the libraries listed below
Sorting:
- [ICML 2025] Code for "R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts"☆15Updated 4 months ago
- Official PyTorch Implementation for Vision-Language Models Create Cross-Modal Task Representations, ICML 2025☆30Updated 3 months ago
- ☆21Updated 9 months ago
- [TMLR 2024] Official implementation of "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"☆20Updated last year
- Code and benchmark for the paper: "A Practitioner's Guide to Continual Multimodal Pretraining" [NeurIPS'24]☆57Updated 7 months ago
- X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains☆47Updated 3 months ago
- Project for SNARE benchmark☆11Updated last year
- Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs?☆14Updated 2 months ago
- Distributed Optimization Infra for learning CLIP models☆27Updated 10 months ago
- Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models☆40Updated 3 weeks ago
- Symmetrical Visual Contrastive Optimization: Aligning Vision-Language Models with Minimal Contrastive Images☆12Updated 2 months ago
- Holistic evaluation of multimodal foundation models☆48Updated 11 months ago
- Do Vision and Language Models Share Concepts? A Vector Space Alignment Study☆16Updated 8 months ago
- ☆23Updated last month
- Unsupervised GRPO☆41Updated last month
- DeepPHY: Benchmarking Agentic VLMs on Physical Reasoning☆30Updated this week
- ☆17Updated 8 months ago
- [ICLR 2025] CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion☆48Updated last month
- Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models☆24Updated 3 months ago
- Official implementation of Bootstrapping Language Models via DPO Implicit Rewards