facebookresearch / selective-vqa_ood
Implementation for the CVPR 2023 paper "Improving Selective Visual Question Answering by Learning from Your Peers" (https://arxiv.org/abs/2306.08751)
☆24Updated last year
Alternatives and similar repositories for selective-vqa_ood:
Users that are interested in selective-vqa_ood are comparing it to the libraries listed below
- [CVPR 2023] HierVL Learning Hierarchical Video-Language Embeddings☆45Updated last year
- Preference Learning for LLaVA☆37Updated 3 months ago
- Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"☆35Updated 6 months ago
- (ACL'2023) MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning☆35Updated 6 months ago
- [Under Review] Official PyTorch implementation code for realizing the technical part of Phantom of Latent representing equipped with enla…☆49Updated 4 months ago
- Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"☆20Updated 3 weeks ago
- [ACL 2023] PuMer: Pruning and Merging Tokens for Efficient Vision Language Models☆29Updated 4 months ago
- [ECCV2024][ICCV2023] Official PyTorch implementation of SeiT++ and SeiT☆53Updated 6 months ago
- FuseCap: Large Language Model for Visual Data Fusion in Enriched Caption Generation☆53Updated 10 months ago
- ☆41Updated last month
- [ICLR2024] Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model☆41Updated last month
- ☆52Updated last year
- How Good is Google Bard's Visual Understanding? An Empirical Study on Open Challenges☆30Updated last year
- Code and Models for "GeneCIS A Benchmark for General Conditional Image Similarity"☆56Updated last year
- ☆64Updated last year
- ☆24Updated last year
- ☆30Updated last year
- Official repository of paper "Subobject-level Image Tokenization"☆65Updated 9 months ago
- ☆89Updated last year
- [ICLR 23] Contrastive Aligned of Vision to Language Through Parameter-Efficient Transfer Learning☆38Updated last year
- Code Release of F-LMM: Grounding Frozen Large Multimodal Models☆62Updated 6 months ago
- Implementation of "PaLM2-VAdapter:" from the multi-modal model paper: "PaLM2-VAdapter: Progressively Aligned Language Model Makes a Stron…☆17Updated 3 months ago
- ☆23Updated 7 months ago
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment☆41Updated last month
- ECCV2024_Parrot Captions Teach CLIP to Spot Text☆63Updated 5 months ago
- VideoHallucer, The first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)☆26Updated 7 months ago
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆35Updated 8 months ago
- ☆64Updated 7 months ago
- An official codebase for paper " CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos (ICCV 23)"☆52Updated last year
- Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models☆43Updated 8 months ago