JindongGu / Awesome-Prompting-on-Vision-Language-Model
This repo lists relevant papers summarized in our survey paper: A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models.
☆378Updated 3 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for Awesome-Prompting-on-Vision-Language-Model
- Recent LLM-based CV and related works. Welcome to comment/contribute!☆837Updated 5 months ago
- A curated list of prompt-based paper in computer vision and vision-language learning.☆894Updated 10 months ago
- [CVPR 2023] Official repository of paper titled "MaPLe: Multi-modal Prompt Learning".☆659Updated last year
- 📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).☆442Updated 3 weeks ago
- ☆458Updated this week
- A Survey on multimodal learning research.☆315Updated last year
- ☆287Updated 9 months ago
- A curated list of awesome prompt/adapter learning methods for vision-language models like CLIP.☆272Updated 3 weeks ago
- A collection of parameter-efficient transfer learning papers focusing on computer vision and multimodal domains.☆391Updated last month
- A collection of papers on the topic of ``Computer Vision in the Wild (CVinW)''☆1,186Updated 7 months ago
- official implementation of "Interpreting CLIP's Image Representation via Text-Based Decomposition"☆164Updated last month
- [NeurIPS 2023] Text data, code and pre-trained models for paper "Improving CLIP Training with Language Rewrites"☆258Updated 9 months ago
- Recent Advances in Vision and Language Pre-training (VLP)☆288Updated last year
- ☆159Updated 10 months ago
- ☆467Updated 2 years ago
- [CVPR 2024 Highlight] Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding☆204Updated last month
- Exploring Visual Prompts for Adapting Large-Scale Models☆265Updated 2 years ago
- Awesome papers & datasets specifically focused on long-term videos.☆194Updated 3 weeks ago
- [CVPR 2024] Official PyTorch Code for "PromptKD: Unsupervised Prompt Distillation for Vision-Language Models"☆230Updated last month
- [CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts☆294Updated 3 months ago
- [CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want☆691Updated 3 months ago
- ☆552Updated 11 months ago
- Experiments and data for the paper "When and why vision-language models behave like bags-of-words, and what to do about it?" Oral @ ICLR …☆250Updated last year
- ☆170Updated last year
- ❄️🔥 Visual Prompt Tuning [ECCV 2022] https://arxiv.org/abs/2203.12119☆1,033Updated last year
- [CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses tha…☆777Updated 5 months ago
- Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.☆542Updated this week
- [ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of …☆457Updated 3 months ago
- Test-time Prompt Tuning (TPT) for zero-shot generalization in vision-language models (NeurIPS 2022))☆144Updated 2 years ago
- A curated publication list on open vocabulary semantic segmentation and related area (e.g. zero-shot semantic segmentation) resources..☆435Updated last month