umd-huang-lab / perceptionCLIP
Code for our ICLR 2024 paper "PerceptionCLIP: Visual Classification by Inferring and Conditioning on Contexts"
☆76Updated 6 months ago
Related projects ⓘ
Alternatives and complementary repositories for perceptionCLIP
- How Good is Google Bard's Visual Understanding? An Empirical Study on Open Challenges☆30Updated last year
- ☆25Updated last year
- Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models☆70Updated 2 months ago
- This repository houses the code for the paper - "The Neglected of VLMs"☆23Updated 3 months ago
- [CVPR 2024 Highlight] ImageNet-D☆38Updated last month
- Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want☆61Updated last month
- ChatterBox: Multi-round Multimodal Referring and Grounding, Multimodal, Multi-round dialogues☆50Updated 6 months ago
- Code Release of F-LMM: Grounding Frozen Large Multimodal Models☆49Updated 3 months ago
- [CVPR2024 Highlight] Official implementation for Transferable Visual Prompting. The paper "Exploring the Transferability of Visual Prompt…☆32Updated 4 months ago
- [NeurIPS 2024] Official implementation of the paper "Interfacing Foundation Models' Embeddings"☆110Updated 3 months ago
- [EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding☆47Updated 10 months ago
- Project for "LaSagnA: Language-based Segmentation Assistant for Complex Queries".☆47Updated 6 months ago
- [ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions☆106Updated 3 weeks ago
- [ICML 2024] This repository includes the official implementation of our paper "Rejuvenating image-GPT as Strong Visual Representation Lea…☆98Updated 6 months ago
- Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"☆33Updated 3 months ago
- Task Residual for Tuning Vision-Language Models (CVPR 2023)☆66Updated last year
- Official implementation of "Describing Differences in Image Sets with Natural Language" (CVPR 2024 Oral)☆109Updated 7 months ago
- ☆30Updated 9 months ago
- This is a public repository for Image Clustering Conditioned on Text Criteria (IC|TC)☆80Updated 8 months ago
- [IJCV 2024] MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation☆112Updated last month
- PyTorch code for "Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training"☆28Updated 8 months ago
- Distribution-Aware Prompt Tuning for Vision-Language Models (ICCV 2023)☆37Updated 11 months ago
- ☆55Updated 4 months ago
- Code and datasets for "What’s “up” with vision-language models? Investigating their struggle with spatial reasoning".☆34Updated 8 months ago
- ☆90Updated 6 months ago
- Code and Models for "GeneCIS A Benchmark for General Conditional Image Similarity"☆54Updated last year
- ☆25Updated last month
- [ECCV 2024] BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models☆82Updated 3 months ago
- [ECCV2024] ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation☆56Updated 2 months ago
- Official Repository of Personalized Visual Instruct Tuning☆24Updated 2 weeks ago