umd-huang-lab / perceptionCLIP
Code for our ICLR 2024 paper "PerceptionCLIP: Visual Classification by Inferring and Conditioning on Contexts"
☆76Updated 6 months ago
Related projects ⓘ
Alternatives and complementary repositories for perceptionCLIP
- Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models☆69Updated last month
- How Good is Google Bard's Visual Understanding? An Empirical Study on Open Challenges☆30Updated last year
- ☆25Updated last year
- Official implementation of "Describing Differences in Image Sets with Natural Language" (CVPR 2024 Oral)☆104Updated 7 months ago
- [ECCV 2024] BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models☆81Updated 2 months ago
- [CVPR2024 Highlight] Official implementation for Transferable Visual Prompting. The paper "Exploring the Transferability of Visual Prompt…☆32Updated 4 months ago
- This is a public repository for Image Clustering Conditioned on Text Criteria (IC|TC)☆79Updated 7 months ago
- [NeurIPS 2023] Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization☆95Updated 8 months ago
- ☆88Updated 5 months ago
- [NeurIPS 2024] Official implementation of the paper "Interfacing Foundation Models' Embeddings"☆110Updated 2 months ago
- Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"☆33Updated 2 months ago
- [ICLR 23] Contrastive Aligned of Vision to Language Through Parameter-Efficient Transfer Learning☆36Updated last year
- [ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models☆45Updated last month
- Project for "LaSagnA: Language-based Segmentation Assistant for Complex Queries".☆47Updated 6 months ago
- Task Residual for Tuning Vision-Language Models (CVPR 2023)☆65Updated last year
- Official implementation of "Why are Visually-Grounded Language Models Bad at Image Classification?" (NeurIPS 2024)☆49Updated 3 weeks ago
- Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want☆59Updated 2 weeks ago
- This repo contains evaluation code for the paper "BLINK: Multimodal Large Language Models Can See but Not Perceive". https://arxiv.or…☆107Updated 4 months ago
- Code and datasets for "What’s “up” with vision-language models? Investigating their struggle with spatial reasoning".☆34Updated 8 months ago
- ☆30Updated 9 months ago
- Official repository of paper titled "How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs".☆40Updated 2 months ago
- [ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions☆105Updated last week
- Code base of SynthCLIP: CLIP training with purely synthetic text-image pairs from LLMs and TTIs.☆87Updated 7 months ago
- [CVPR' 2024] Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Fine-grained Understanding☆40Updated 3 months ago
- [NeurIPS2023] Official implementation and model release of the paper "What Makes Good Examples for Visual In-Context Learning?"☆166Updated 8 months ago
- ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback☆52Updated last month
- [EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding☆47Updated 10 months ago
- PyTorch code for "Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training"☆27Updated 8 months ago
- [NeurIPS2023] Official implementation of the paper "Large Language Models are Visual Reasoning Coordinators"☆102Updated last year
- [Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning☆71Updated 6 months ago