mbzuai-oryx / VideoGLaMM
A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
☆34Updated 2 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for VideoGLaMM
- Official repository of paper titled "How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs".☆43Updated 3 months ago
- Composed Video Retrieval☆46Updated 6 months ago
- Official implementation of the paper "STEREO: Towards Adversarially Robust Concept Erasing from Text-to-Image Generation Models"☆15Updated 2 months ago
- [NeurIPS 2024] Official PyTorch implementation of LoTLIP: Improving Language-Image Pre-training for Long Text Understanding☆32Updated last week
- [NeurIPS 2023] Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization☆97Updated 9 months ago
- Contains code and documentation for our VANE-Bench paper.☆10Updated 5 months ago
- Code Release of F-LMM: Grounding Frozen Large Multimodal Models☆51Updated 3 months ago
- ☆16Updated last year
- Implementation of the paper "PerSense: Personalized Instance Segmentation in Dense Images"☆21Updated last month
- Official code repository of paper titled "Test-Time Low Rank Adaptation via Confidence Maximization for Zero-Shot Generalization of Visio…☆19Updated 3 months ago
- FreeDA: Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation (CVPR 2024)☆29Updated 2 months ago
- [CVPRW 2024] Official repository of paper titled "Learning to Prompt with Text Only Supervision for Vision-Language Models".☆91Updated 3 months ago
- Emerging Pixel Grounding in Large Multimodal Models Without Grounding Supervision☆24Updated last month
- Towards Evaluating the Robustness of Visual State Space Models☆22Updated 2 months ago
- FreeVA: Offline MLLM as Training-Free Video Assistant☆49Updated 5 months ago
- ☆41Updated last year
- [ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions☆106Updated 3 weeks ago
- Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"☆78Updated 8 months ago
- [NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos☆52Updated this week
- [CVPR2024 Highlight] Official implementation for Transferable Visual Prompting. The paper "Exploring the Transferability of Visual Prompt…☆32Updated 5 months ago
- [ACCV 2024] ObjectCompose: Evaluating Resilience of Vision-Based Models on Object-to-Background Compositional Changes 🚀🚀🚀☆32Updated last month
- [ECCV2024] Learning Video Context as Interleaved Multimodal Sequences☆30Updated last month
- Code for paper "AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention"☆16Updated 4 months ago
- Task Residual for Tuning Vision-Language Models (CVPR 2023)☆66Updated last year
- [MICCAI 2023][Early Accept] Official code repository of paper titled "Cross-modulated Few-shot Image Generation for Colorectal Tissue Cla…☆45Updated last year
- This repo holds the official code and data for "Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentati…☆63Updated 5 months ago
- [NeurIPS 2024] Mitigating Object Hallucination via Concentric Causal Attention☆44Updated 2 weeks ago
- [CVPR 2024] Improving language-visual pretraining efficiency by perform cluster-based masking on images.☆22Updated 6 months ago
- ☆21Updated last month
- Official implementation of "Why are Visually-Grounded Language Models Bad at Image Classification?" (NeurIPS 2024)☆52Updated last month