g-luo / geolocation_via_guidebook_grounding
G^3: Geolocation via Guidebook Grounding, Findings of EMNLP 2022
☆17Updated 7 months ago
Alternatives and similar repositories for geolocation_via_guidebook_grounding:
Users that are interested in geolocation_via_guidebook_grounding are comparing it to the libraries listed below
- Code and data setup for the paper "Are Diffusion Models Vision-and-language Reasoners?"☆32Updated last year
- [CVPR24 Highlights] Polos: Multimodal Metric Learning from Human Feedback for Image Captioning☆30Updated 9 months ago
- [CVPR 2023] Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation☆61Updated last month
- Official Code Release for "Diagnosing and Rectifying Vision Models using Language" (ICLR 2023)☆33Updated last year
- CLIP4IDC: CLIP for Image Difference Captioning (AACL 2022)☆34Updated 2 years ago
- The SVO-Probes Dataset for Verb Understanding☆31Updated 3 years ago
- This is the implementation of CounterCurate, the data curation pipeline of both physical and semantic counterfactual image-caption pairs.☆18Updated 10 months ago
- [ICLR 2025] Official code repository for "TULIP: Token-length Upgraded CLIP"☆25Updated 2 months ago
- COLA: Evaluate how well your vision-language model can Compose Objects Localized with Attributes!☆24Updated 5 months ago
- The official repo for "Where do Large Vision-Language Models Look at when Answering Questions?"☆35Updated last month
- ☆30Updated last year
- ☆41Updated 9 months ago
- NegCLIP.☆31Updated 2 years ago
- Code and data for ACL 2024 paper on 'Cross-Modal Projection in Multimodal LLMs Doesn't Really Project Visual Attributes to Textual Space'☆13Updated 9 months ago
- Code and datasets for "What’s “up” with vision-language models? Investigating their struggle with spatial reasoning".☆44Updated last year
- A instruction data generation system for multimodal language models.☆33Updated 2 months ago
- ICCV 2023 (Oral) Open-domain Visual Entity Recognition Towards Recognizing Millions of Wikipedia Entities☆40Updated 7 months ago
- Code, data, models for the Sherlock corpus☆57Updated 2 years ago
- Command-line tool for downloading and extending the RedCaps dataset.☆46Updated last year
- ☆47Updated last year
- [ICLR 2024] Official repository for "Vision-by-Language for Training-Free Compositional Image Retrieval"☆64Updated 9 months ago
- Repository for the paper: dense and aligned captions (dac) promote compositional reasoning in vl models☆27Updated last year
- ☆10Updated 5 months ago
- Code and results accompanying our paper titled CHiLS: Zero-Shot Image Classification with Hierarchical Label Sets☆57Updated last year
- Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"☆35Updated 8 months ago
- Code and data for EMNLP 2023 paper "Grounding Visual Illusions in Language: Do Vision-Language Models Perceive Illusions Like Humans?"☆12Updated last year
- [NeurIPS 2023] A faithful benchmark for vision-language compositionality☆79Updated last year
- Rui Qian, Xin Yin, Dejing Dou†: Reasoning to Attend: Try to Understand How <SEG> Token Works (CVPR 2025)☆28Updated 3 weeks ago
- Repo for paper: "Paxion: Patching Action Knowledge in Video-Language Foundation Models" Neurips 23 Spotlight☆37Updated last year
- Code, Data and Red Teaming for ZeroBench☆45Updated 2 months ago