[CVPR 2025 π₯]A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
β99Apr 14, 2025Updated last year
Alternatives and similar repositories for VideoGLaMM
Users that are interested in VideoGLaMM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ECCVW 2024 -- ORAL] Official repository of paper titled "Makeup-Guided Facial Privacy Protection via Untrained Neural Network Priors".β12Oct 11, 2024Updated last year
- β11Oct 29, 2024Updated last year
- [CVPRW-25 MMFM] Official repository of paper titled "How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite foβ¦β50Aug 23, 2024Updated last year
- [CVPRW 2025] Official repository of paper titled "Towards Evaluating the Robustness of Visual State Space Models"β26Jun 8, 2025Updated 10 months ago
- [MICCAI 2025] Hierarchical Self-Supervised Adversarial Training for Robust Vision Models in Histopathologyβ12Jun 17, 2025Updated 9 months ago
- GPU virtual machines on DigitalOcean Gradient AI β’ AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Official code repository of paper titled "Test-Time Low Rank Adaptation via Confidence Maximization for Zero-Shot Generalization of Visioβ¦β34May 11, 2025Updated 11 months ago
- [NAACL'25] Contains code and documentation for our VANE-Bench paper.β23Aug 19, 2025Updated 7 months ago
- [BMVC 2024] On Evaluating Adversarial Robustness of Volumetric Medical Segmentation Modelsβ15Nov 1, 2024Updated last year
- [ACCV 2024] ObjectCompose: Evaluating Resilience of Vision-Based Models on Object-to-Background Compositional Changes πππβ37Jan 21, 2025Updated last year
- (ICCV 2023) Generative Multiplane Neural Radiance for 3D Aware Image Generation.β19Sep 28, 2023Updated 2 years ago
- [MICCAI 2023][Early Accept] Official code repository of paper titled "Cross-modulated Few-shot Image Generation for Colorectal Tissue Claβ¦β47Sep 28, 2023Updated 2 years ago
- [CVPR 2023] Bridging Precision and Confidence: A Train-Time Loss for Calibrating Object Detectionβ30Jun 21, 2023Updated 2 years ago
- PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Modelsβ263Aug 5, 2025Updated 8 months ago
- VideoMathQA is a benchmark designed to evaluate mathematical reasoning in real-world educational videosβ23Jan 26, 2026Updated 2 months ago
- Virtual machines for every use case on DigitalOcean β’ AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- A new multi-task learning framework using Vision Transformersβ11Jun 19, 2024Updated last year
- ARB: A Comprehensive Arabic Multimodal Reasoning Benchmarkβ17May 25, 2025Updated 10 months ago
- [MICCAI 2023] Official code repository of paper titled "Frequency Domain Adversarial Training for Robust Volumetric Medical Segmentation"β¦β52Nov 14, 2023Updated 2 years ago
- [ICLR 2024] Official code for the paper "LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts"β84May 18, 2024Updated last year
- Official code of the paper "VideoMolmo: Spatio-Temporal Grounding meets Pointing"β54Jul 5, 2025Updated 9 months ago
- Learnable Weight Initialization for Volumetric Medical Image Segmentation [Elsevier AIM2024]β22Oct 27, 2024Updated last year
- [CVPR 2025 π₯] ALM-Bench is a multilingual multi-modal diverse cultural benchmark for 100 languages across 19 categories. It assesses theβ¦β46May 26, 2025Updated 10 months ago
- [EMNLP'23] ClimateGPT: a specialized LLM for conversations related to Climate Change and Sustainability topics in both English and Arabiβ¦β79Sep 24, 2024Updated last year
- β42Nov 9, 2023Updated 2 years ago
- AI Agents on DigitalOcean Gradient AI Platform β’ AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- This repository contains the official source code for SALT: Parameter-Efficient Fine-Tuning via Singular Value Adaptation with Low-Rank Tβ¦β29Nov 29, 2025Updated 4 months ago
- [CVPR 2024 π₯] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses thaβ¦β951Aug 5, 2025Updated 8 months ago
- This repository contains the code for Optimizing Brain Tumor Segmentation with MedNeXt: BraTS 2024 SSA and Pediatrics (MICCAI'24)β27Mar 22, 2025Updated last year
- [MICCAI 2024] Official code repository of paper titled "BAPLe: Backdoor Attacks on Medical Foundation Models using Prompt Learning" accepβ¦β56Oct 22, 2024Updated last year
- [MICCAI 2024] Official code for the paper "MedContext: Learning Contextual Cues for Efficient Volumetric Medical Segmentation"β14Nov 1, 2024Updated last year
- [NeurIPS 2023] Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalizationβ110Feb 11, 2024Updated 2 years ago
- Composed Video Retrievalβ62May 2, 2024Updated last year
- [ACL 2025 π₯] Time Travel is a Comprehensive Benchmark to Evaluate LMMs on Historical and Cultural Artifactsβ19May 22, 2025Updated 10 months ago
- β25Mar 13, 2026Updated last month
- Wordpress hosting with auto-scaling - Free Trial β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- [β CVPR 2025 Highlight β] Official Implementation of the paper STEREO: A Two-Stage Framework for Adversarially Robust Concept Erasing froβ¦β29Apr 22, 2025Updated 11 months ago
- [ICCVW 2025 (Oral)] Robust-LLaVA: On the Effectiveness of Large-Scale Robust Image Encoders for Multi-modal Large Language Modelsβ29Oct 20, 2025Updated 5 months ago
- Code of the Grounded MUIE model, REAMOβ11Dec 3, 2024Updated last year
- β70Jul 2, 2025Updated 9 months ago
- [NAACL 2025 π₯] CAMEL-Bench is an Arabic benchmark for evaluating multimodal models across eight domains with 29,000 questions.β38Apr 17, 2025Updated 11 months ago
- Code for "AVG-LLaVA: A Multimodal Large Model with Adaptive Visual Granularity"β33Oct 12, 2024Updated last year
- [CVPR 2023] Official repository of paper titled "Fine-tuned CLIP models are efficient video learners".β305Apr 3, 2024Updated 2 years ago