☆13Aug 7, 2025Updated 7 months ago
Alternatives and similar repositories for Vinoground
Users that are interested in Vinoground are comparing it to the libraries listed below
Sorting:
- [NAACL 2025] Source code for MMEvalPro, a more trustworthy and efficient benchmark for evaluating LMMs☆24Sep 26, 2024Updated last year
- [TACL] Do Vision and Language Models Share Concepts? A Vector Space Alignment Study☆16Nov 22, 2024Updated last year
- ☆17Aug 1, 2024Updated last year
- VHTest☆15Oct 31, 2024Updated last year
- [EMNLP 2024] Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality☆21Oct 8, 2024Updated last year
- This is the implementation of CounterCurate, the data curation pipeline of both physical and semantic counterfactual image-caption pairs.☆19Jun 27, 2024Updated last year
- Visual and Embodied Concepts evaluation benchmark☆21Oct 10, 2023Updated 2 years ago
- Code for "CLIP Behaves like a Bag-of-Words Model Cross-modally but not Uni-modally"☆20Feb 27, 2026Updated last week
- [CVPR2025] VideoICL: Confidence-based Iterative In-context Learning for Out-of-Distribution Video Understanding☆24Mar 24, 2025Updated 11 months ago
- Collection of works for evaluating (and analyzing) large audio-language models (LALMs)☆40Aug 11, 2025Updated 6 months ago
- Official implementation of "MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in Multimodal Large Language Model". Our co…☆25Dec 20, 2024Updated last year
- MM-PRM: Enhancing Multimodal Mathematical Reasoning with Scalable Step-Level Supervision☆27May 26, 2025Updated 9 months ago
- [NeurIPS 2023] Bootstrapping Vision-Language Learning with Decoupled Language Pre-training☆27Dec 5, 2023Updated 2 years ago
- ☆27Jun 4, 2024Updated last year
- [NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration☆26Oct 17, 2024Updated last year
- [ICLR 2025] PyTorch Implementation of "ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time"☆29Jul 20, 2025Updated 7 months ago
- Matryoshka Multimodal Models☆122Jan 22, 2025Updated last year
- ☆33Apr 22, 2025Updated 10 months ago
- (NeurIPS 2024) What Makes CLIP More Robust to Long-Tailed Pre-Training Data? A Controlled Study for Transferable Insights☆27Oct 28, 2024Updated last year
- The first spoken long-text dataset derived from live streams, designed to reflect the redundancy-rich and conversational nature of real-w…☆12Jun 28, 2025Updated 8 months ago
- A Text2SQL benchmark for evaluation of Large Language Models☆41Updated this week
- An online AI security course created by UChicago's XLab☆30Feb 21, 2026Updated 2 weeks ago
- G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning☆99May 20, 2025Updated 9 months ago
- 👾 E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding (NeurIPS 2024)☆74Jan 20, 2025Updated last year
- Distributed Optimization Infra for learning CLIP models☆27Oct 3, 2024Updated last year
- [NeurIPS 2024] Official Repository of Multi-Object Hallucination in Vision-Language Models☆34Nov 13, 2024Updated last year
- TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models☆37Nov 10, 2024Updated last year
- [NeurIPS ENLSP Workshop'24] CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenarios☆16Oct 18, 2024Updated last year
- ☆18Jun 10, 2025Updated 8 months ago
- The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism☆30Jul 17, 2024Updated last year
- (NeurIPS 2024 Spotlight) TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignment☆29Sep 27, 2024Updated last year
- Sparse Backpropagation for Mixture-of-Expert Training☆29Jul 2, 2024Updated last year
- Software artifacts and Demos for CS559 (Fall 2024) "Computer Graphics"☆10Dec 6, 2024Updated last year
- [ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …☆129Apr 4, 2025Updated 11 months ago
- ☆36May 24, 2024Updated last year
- Codes for ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding [ICML 2025]]☆45Jul 22, 2025Updated 7 months ago
- Code for "AVG-LLaVA: A Multimodal Large Model with Adaptive Visual Granularity"☆33Oct 12, 2024Updated last year
- ☆37Nov 8, 2024Updated last year
- GPT-4V(ision) as A Social Media Analysis Engine☆38Dec 20, 2024Updated last year