StanfordMIMI / villa
ViLLA: Fine-grained vision-language representation learning from real-world data
☆39Updated last year
Alternatives and similar repositories for villa:
Users that are interested in villa are comparing it to the libraries listed below
- [ICML 2024] Memory-Space Visual Prompting for Efficient Vision-Language Fine-Tuning☆45Updated 8 months ago
- [ICLR 23] Contrastive Aligned of Vision to Language Through Parameter-Efficient Transfer Learning☆37Updated last year
- ☆19Updated last year
- Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning☆19Updated 4 months ago
- (ACL'2023) MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning☆35Updated 5 months ago
- [ICCV2023] Official code for "VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control"☆53Updated last year
- Code for paper: VL-ICL Bench: The Devil in the Details of Benchmarking Multimodal In-Context Learning☆34Updated last week
- [CVPR-2023] The official dataset of Advancing Visual Grounding with Scene Knowledge: Benchmark and Method.☆29Updated last year
- The official GitHub page for ''What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Ins…☆18Updated last year
- [CVPR' 2024] Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Fine-grained Understanding☆44Updated 5 months ago
- Code for "AVG-LLaVA: A Multimodal Large Model with Adaptive Visual Granularity"☆19Updated 3 months ago
- COLA: Evaluate how well your vision-language model can Compose Objects Localized with Attributes!☆24Updated last month
- [ICCV23] Official implementation of eP-ALM: Efficient Perceptual Augmentation of Language Models.☆27Updated last year
- [ICLR2024] The official implementation of paper "UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal Modeling", by …☆71Updated 11 months ago
- ☆29Updated last year
- Code Release of F-LMM: Grounding Frozen Large Multimodal Models☆60Updated 5 months ago
- [ICML2024] Repo for the paper `Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models'☆20Updated 2 weeks ago
- [ACL 2024] Multi-modal preference alignment remedies regression of visual instruction tuning on language model☆32Updated 2 months ago
- NegCLIP.☆29Updated last year
- Compress conventional Vision-Language Pre-training data☆49Updated last year
- The released data for paper "Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models".☆32Updated last year
- ☆57Updated last year
- ☆87Updated last year
- [NeurIPS 2023] Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization☆103Updated 11 months ago
- Preference Learning for LLaVA☆29Updated 2 months ago
- [Arxiv 2024] AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention☆21Updated 6 months ago
- ☆29Updated last month
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆60Updated 7 months ago
- [Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning☆78Updated 8 months ago
- Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".☆56Updated last year