ugorsahin / Generative-Negative-MiningLinks
Enhancing Multimodal Compositional Reasoning of Visual Language Models with Generative Negative Mining, WACV 2024
☆14Updated last year
Alternatives and similar repositories for Generative-Negative-Mining
Users that are interested in Generative-Negative-Mining are comparing it to the libraries listed below
Sorting:
- [ICLR 2023] CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding☆45Updated last month
- Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs?☆14Updated last month
- ☆58Updated last year
- Syphus: Automatic Instruction-Response Generation Pipeline☆14Updated last year
- Code and data for the paper: Learning Action and Reasoning-Centric Image Editing from Videos and Simulation☆30Updated 3 weeks ago
- High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning☆38Updated 2 weeks ago
- ☆22Updated last month
- If CLIP Could Talk: Understanding Vision-Language Model Representations Through Their Preferred Concept Descriptions☆17Updated last year
- [NeurIPS 2024] Official Repository of Multi-Object Hallucination in Vision-Language Models☆29Updated 8 months ago
- [ICCV 2023] Going Beyond Nouns With Vision & Language Models Using Synthetic Data☆12Updated last year
- ☆42Updated 8 months ago
- Distribution-Aware Prompt Tuning for Vision-Language Models (ICCV 2023)☆41Updated last year
- Code for paper: Unified Text-to-Image Generation and Retrieval☆15Updated last year
- [ICLR 2025] CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion☆48Updated 3 weeks ago
- Code for "Are “Hierarchical” Visual Representations Hierarchical?" in NeurIPS Workshop for Symmetry and Geometry in Neural Representation…☆21Updated last year
- Official implementation of "Connect, Collapse, Corrupt: Learning Cross-Modal Tasks with Uni-Modal Data" (ICLR 2024)☆32Updated 9 months ago
- Official code for Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation☆32Updated 2 months ago
- Code and datasets for "Text encoders are performance bottlenecks in contrastive vision-language models". Coming soon!☆11Updated 2 years ago
- VPEval Codebase from Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆45Updated last year
- [CVPR 2024] Improving language-visual pretraining efficiency by perform cluster-based masking on images.☆28Updated last year
- ☆36Updated this week
- This repository houses the code for the paper - "The Neglected of VLMs"☆28Updated 2 months ago
- [NeurIPS2023] Official implementation of the paper "Large Language Models are Visual Reasoning Coordinators"☆105Updated last year
- Benchmarking Multi-Image Understanding in Vision and Language Models☆11Updated 11 months ago
- Source code for the Paper "Mind the Gap: Benchmarking Spatial Reasoning in Vision-Language Models"☆13Updated this week
- (NeurIPS 2024) What Makes CLIP More Robust to Long-Tailed Pre-Training Data? A Controlled Study for Transferable Insights☆27Updated 8 months ago
- [NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration☆24Updated 9 months ago
- Official code repo of PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs☆26Updated 6 months ago
- ☆45Updated 6 months ago
- [ECCV’24] Official repository for "BEAF: Observing Before-AFter Changes to Evaluate Hallucination in Vision-language Models"☆20Updated 3 months ago