π€ [ICLR'25] Multimodal Video Understanding Framework (MVU)
β56Jan 31, 2025Updated last year
Alternatives and similar repositories for mvu
Users that are interested in mvu are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Code for our ACL 2025 paper "Language Repository for Long Video Understanding"β36Jun 17, 2024Updated 2 years ago
- Perceptual Grouping in Contrastive Vision-Language Models (ICCV'23)β37Jan 1, 2024Updated 2 years ago
- [Main Conference @ EACL'26] [Workshop @ NeurIPS'24] ποΈ LVNet.β44Feb 10, 2026Updated 4 months ago
- [ECCV 2024] Official Implementation of CoPT: Unsupervised Domain Adaptive Segmentation using Domain-Agnostic Text Embeddingsβ11Feb 24, 2025Updated last year
- An unofficial pytorch dataloader for Open X-Embodiment Datasets https://github.com/google-deepmind/open_x_embodimentβ25Jan 9, 2025Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer β’ AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- [ICRA'24] Crossway Diffusion: Improving Diffusion-based Visuomotor Policy via Self-supervised Learningβ71Aug 4, 2024Updated last year
- Code for NeurIPS 2023 paper "Active Vision Reinforcement Learning with Limited Visual Observability"β56Oct 10, 2024Updated last year
- β39Jun 2, 2026Updated last month
- β14Jun 25, 2022Updated 4 years ago
- Code for LifelongMemory: Leveraging LLMs for Answering Queries in Long-form Egocentric Videosβ31Oct 27, 2025Updated 8 months ago
- β18Dec 17, 2022Updated 3 years ago
- This is a python library. Install with "python3 -m pip install rp" then run with "python3 -m rp" or just "rp". Requires pythonβ₯3.5β13Jun 3, 2026Updated last month
- Code for NeurIPS 2022 paper "Learning Viewpoint-Agnostic Visual Representations by Recovering Tokens in 3D Space"β20Apr 20, 2023Updated 3 years ago
- Fast Vision Mamba : Pool your Spatial Dimensions for Accelerated Processingβ19Jan 28, 2025Updated last year
- Managed Database hosting by DigitalOcean β’ AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- [ICLR'25] LLaRA: Supercharging Robot Learning Data for Vision-Language Policyβ229Mar 29, 2025Updated last year
- Official repository for "Self-Supervised Video Transformer" (CVPR'22)β109Jun 26, 2024Updated 2 years ago
- WACV 2024: "PathLDM: Text conditioned Latent Diffusion Model for Histopathology"β51Jul 7, 2024Updated last year
- Environments for Active Vision Reinforcement Learningβ30Oct 10, 2024Updated last year
- Official implementation for "A Simple LLM Framework for Long-Range Video Question-Answering"β106Oct 27, 2024Updated last year
- Theia: Distilling Diverse Vision Foundation Models for Robot Learningβ277Nov 6, 2025Updated 7 months ago
- (NeurIPS 2024 Spotlight) TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignmentβ29Sep 27, 2024Updated last year
- [ECCV 2024] Code for Betrayed by Attention: A Simple yet Effective Approach for Self-supervised Video Object Segmentationβ34Mar 7, 2025Updated last year
- Pytorch implementation of HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Modelsβ28Mar 22, 2024Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- [IEEE TCSVT] Official Pytorch Implementation of CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance Segmentation.β48Jan 7, 2025Updated last year
- Code for CVPR25 paper "VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos"β165Jun 23, 2025Updated last year
- [ECCV2022] [T-PAMI] StARformer: Transformer with State-Action-Reward Representations.β97May 21, 2023Updated 3 years ago
- Official implementation of PARIS3D (Accepted to ECCV 2024).β27Sep 25, 2024Updated last year
- Pytorch Implementation of the Model from "MIRASOL3B: A MULTIMODAL AUTOREGRESSIVE MODEL FOR TIME-ALIGNED AND CONTEXTUAL MODALITIES"β26Jan 27, 2025Updated last year
- The official repository of our paper "Reinforcing Video Reasoning with Focused Thinking"β36Jun 12, 2025Updated last year
- β29May 13, 2025Updated last year
- β13Apr 13, 2026Updated 2 months ago
- Uncertainty-Guided Pseudo-Labelling with Model Averagingβ11Mar 17, 2026Updated 3 months ago
- Deploy on Railway without the complexity - Free Credits Offer β’ AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- (2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understandingβ350Jul 19, 2024Updated last year
- This is the official code of VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding (ECCV 2024)β318Dec 5, 2024Updated last year
- β28Mar 20, 2023Updated 3 years ago
- Official Repository of "Fibottention: Inceptive Visual Representation Learning with Diverse Attention Across Heads"β17Oct 6, 2025Updated 8 months ago
- Learnable Weight Initialization for Volumetric Medical Image Segmentation [Elsevier AIM2024]β22Oct 27, 2024Updated last year
- [CVPR 2025] Official PyTorch code of "Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation".β58May 25, 2025Updated last year
- This is the official impletations of the EMNLP Findings paper, VideoINSTA: Zero-shot Long-Form Video Understanding via Informative Spatiaβ¦β24Apr 7, 2026Updated 2 months ago