Flowerfan / VistaLLaMALinks
☆15Updated 9 months ago
Alternatives and similar repositories for VistaLLaMA
Users that are interested in VistaLLaMA are comparing it to the libraries listed below
Sorting:
- ☆14Updated 5 months ago
- [NeurIPS 2024] Official PyTorch implementation of "Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives"☆42Updated 9 months ago
- [CVPR2024] The code of "UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory"☆67Updated 11 months ago
- [ICCV 2025] Official code for "AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning"☆38Updated 3 months ago
- ☆31Updated last year
- FreeVA: Offline MLLM as Training-Free Video Assistant☆63Updated last year
- [Pattern Recognition 2025] Cross-Modal Adapter for Vision-Language Retrieval☆66Updated last month
- Learning 1D Causal Visual Representation with De-focus Attention Networks☆35Updated last year
- The efficient tuning method for VLMs☆79Updated last year
- Compress conventional Vision-Language Pre-training data☆52Updated 2 years ago
- [NeurIPS 2024] MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models☆72Updated 4 months ago
- Official code for the paper, "TaCA: Upgrading Your Visual Foundation Model with Task-agnostic Compatible Adapter".☆16Updated 2 years ago
- ☆25Updated 2 years ago
- Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning☆22Updated last year
- Emerging Pixel Grounding in Large Multimodal Models Without Grounding Supervision☆40Updated 6 months ago
- [CVPR2025] Code Release of F-LMM: Grounding Frozen Large Multimodal Models☆103Updated 3 months ago
- (ICML 2024) Improve Context Understanding in Multimodal Large Language Models via Multimodal Composition Learning☆27Updated last year
- [CVPR 2024] "Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition"☆13Updated last year
- Distilling Large Vision-Language Model with Out-of-Distribution Generalizability (ICCV 2023)☆59Updated last year
- CVPR2024: Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models☆83Updated last year
- Task Residual for Tuning Vision-Language Models (CVPR 2023)☆73Updated 2 years ago
- Accelerating Vision-Language Pretraining with Free Language Modeling (CVPR 2023)☆32Updated 2 years ago
- [ICLR2024] The official implementation of paper "UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal Modeling", by …☆76Updated last year
- Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning☆27Updated last month
- (NeurIPS 2024) What Makes CLIP More Robust to Long-Tailed Pre-Training Data? A Controlled Study for Transferable Insights☆28Updated 10 months ago
- ☆81Updated 10 months ago
- A curated list of Awesome Personalized Large Multimodal Models resources☆38Updated last week
- [CVPR 2025 Highlight] Official Pytorch codebase for paper: "Assessing and Learning Alignment of Unimodal Vision and Language Models"☆49Updated last month
- [ECCV2024] Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models☆18Updated last year
- [ACL 2025] PruneVid: Visual Token Pruning for Efficient Video Large Language Models☆53Updated 4 months ago