yuhui-zh15 / C3
Official implementation of "Connect, Collapse, Corrupt: Learning Cross-Modal Tasks with Uni-Modal Data" (ICLR 2024)
☆28Updated last month
Related projects ⓘ
Alternatives and complementary repositories for C3
- Code and datasets for "What’s “up” with vision-language models? Investigating their struggle with spatial reasoning".☆34Updated 8 months ago
- Official implementation of "Why are Visually-Grounded Language Models Bad at Image Classification?" (NeurIPS 2024)☆51Updated last month
- Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision☆47Updated 4 months ago
- [Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning☆73Updated 6 months ago
- Code for paper: VL-ICL Bench: The Devil in the Details of Benchmarking Multimodal In-Context Learning☆28Updated 7 months ago
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆45Updated 5 months ago
- [ICML2024] Repo for the paper `Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models'☆20Updated last month
- ☆43Updated 6 months ago
- ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback☆52Updated 2 months ago
- ☕️ CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion☆27Updated 5 months ago
- Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".