g-luo / geolocation_via_guidebook_groundingLinks
G^3: Geolocation via Guidebook Grounding, Findings of EMNLP 2022
☆17Updated 8 months ago
Alternatives and similar repositories for geolocation_via_guidebook_grounding
Users that are interested in geolocation_via_guidebook_grounding are comparing it to the libraries listed below
Sorting:
- Sapsucker Woods 60 Audiovisual Dataset☆15Updated 2 years ago
- Command-line tool for downloading and extending the RedCaps dataset.☆47Updated last year
- [CVPR24 Highlights] Polos: Multimodal Metric Learning from Human Feedback for Image Captioning☆30Updated 2 weeks ago
- ☆31Updated last year
- ICCV 2023 (Oral) Open-domain Visual Entity Recognition Towards Recognizing Millions of Wikipedia Entities☆40Updated 9 months ago
- [ICML 2024] Fool Your (Vision and) Language Model With Embarrassingly Simple Permutations☆14Updated last year
- [CVPR 2023] Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation☆61Updated 3 months ago
- CLIP4IDC: CLIP for Image Difference Captioning (AACL 2022)☆34Updated 2 years ago
- The SVO-Probes Dataset for Verb Understanding☆31Updated 3 years ago
- CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations☆28Updated last year
- [ICCV 2021] Official implementation of the paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"☆66Updated 3 years ago
- We present **FOCI**, a benchmark for Fine-grained Object ClassIfication for large vision language models (LVLMs).☆16Updated 11 months ago
- Official Code Release for "Diagnosing and Rectifying Vision Models using Language" (ICLR 2023)☆33Updated last year
- Code and data setup for the paper "Are Diffusion Models Vision-and-language Reasoners?"☆32Updated last year
- Accompanying repo for CVPRW'24: Charting New Territories: Exploring the Geographic and Geospatial Capabilities of Multimodal LLMs☆27Updated 2 weeks ago
- (ACL'2023) MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning☆35Updated 10 months ago
- Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"☆34Updated 9 months ago
- NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language Tasks, CVPR 2022 (Oral)☆48Updated last year
- Code and datasets for "What’s “up” with vision-language models? Investigating their struggle with spatial reasoning".☆52Updated last year
- Code and dataset release for Park et al., Robust Change Captioning (ICCV 2019)☆48Updated 2 years ago
- GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks☆48Updated 2 months ago
- Sparse autoencoders for vision☆33Updated this week
- Official repository of paper "Subobject-level Image Tokenization"☆73Updated 2 months ago
- Code for paper "Point and Ask: Incorporating Pointing into Visual Question Answering"☆19Updated 2 years ago
- ☆32Updated last year
- EMNLP 2018. Learning to Describe Differences Between Pairs of Similar Images. Harsh Jhamtani, Taylor Berg-Kirkpatrick.☆63Updated 5 years ago
- This is the implementation of CounterCurate, the data curation pipeline of both physical and semantic counterfactual image-caption pairs.☆18Updated 11 months ago
- [CVPR 23] Q: How to Specialize Large Vision-Language Models to Data-Scarce VQA Tasks? A: Self-Train on Unlabeled Images!☆16Updated last year
- Official implementation of "Describing Differences in Image Sets with Natural Language" (CVPR 2024 Oral)☆119Updated last year
- COLA: Evaluate how well your vision-language model can Compose Objects Localized with Attributes!☆24Updated 6 months ago