Flowerfan / VistaLLaMALinks
☆14Updated last year
Alternatives and similar repositories for VistaLLaMA
Users that are interested in VistaLLaMA are comparing it to the libraries listed below
Sorting:
- ☆16Updated 10 months ago
- [CVPR2024] The code of "UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory"☆69Updated last year
- Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning☆24Updated last year
- (ECCV 2024) Empowering Multimodal Large Language Model as a Powerful Data Generator☆114Updated 10 months ago
- ☆83Updated last year
- 【NeurIPS 2024】The official code of paper "Automated Multi-level Preference for MLLMs"☆21Updated last year
- [NeurIPS 2024] Official PyTorch implementation of "Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives"☆46Updated last year
- [NeurIPS 2024] MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models☆79Updated last month
- ☆26Updated 2 years ago
- Towards a Unified View on Visual Parameter-Efficient Transfer Learning☆26Updated 3 years ago
- Compress conventional Vision-Language Pre-training data☆53Updated 2 years ago
- Rui Qian, Xin Yin, Dejing Dou†: Reasoning to Attend: Try to Understand How <SEG> Token Works (CVPR 2025)☆49Updated last week
- Accelerating Vision-Language Pretraining with Free Language Modeling (CVPR 2023)☆32Updated 2 years ago
- [ECCV 2024] ControlCap: Controllable Region-level Captioning☆80Updated last year
- [ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences☆42Updated 11 months ago
- Emergent Visual Grounding in Large Multimodal Models Without Grounding Supervision☆42Updated 3 months ago
- [ECCV2024] Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models☆20Updated last year
- Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning☆41Updated 6 months ago
- [CVPR 2025] Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering☆53Updated 6 months ago
- ☆32Updated last year
- Turning to Video for Transcript Sorting☆49Updated 2 years ago
- [ICCV 2023 oral] This is the official repository for our paper: ''Sensitivity-Aware Visual Parameter-Efficient Fine-Tuning''.☆75Updated 2 years ago
- [CVPR 2025 Oral] VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection☆134Updated 6 months ago
- FreeVA: Offline MLLM as Training-Free Video Assistant☆68Updated last year
- [ECCV 2022] GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval☆16Updated 3 years ago
- [CVPR 2024] "Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition"☆12Updated last year
- ☆46Updated last year
- (ICML 2024) Improve Context Understanding in Multimodal Large Language Models via Multimodal Composition Learning☆28Updated last year
- [CVPR2025] Code Release of F-LMM: Grounding Frozen Large Multimodal Models☆108Updated 8 months ago
- The official implementation of 《MLLMs-Augmented Visual-Language Representation Learning》☆31Updated last year