π€ [ICLR'25] Multimodal Video Understanding Framework (MVU)
β57Jan 31, 2025Updated last year
Alternatives and similar repositories for mvu
Users that are interested in mvu are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Code for our ACL 2025 paper "Language Repository for Long Video Understanding"β36Jun 17, 2024Updated last year
- [WIP] Code for LangToMoβ20Updated this week
- Perceptual Grouping in Contrastive Vision-Language Models (ICCV'23)β37Jan 1, 2024Updated 2 years ago
- [Main Conference @ EACL'26] [Workshop @ NeurIPS'24] ποΈ LVNet.β42Feb 10, 2026Updated last month
- [ECCV 2024] Official Implementation of CoPT: Unsupervised Domain Adaptive Segmentation using Domain-Agnostic Text Embeddingsβ11Feb 24, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- [ICRA'24] Crossway Diffusion: Improving Diffusion-based Visuomotor Policy via Self-supervised Learningβ70Aug 4, 2024Updated last year
- Code for NeurIPS 2023 paper "Active Vision Reinforcement Learning with Limited Visual Observability"β54Oct 10, 2024Updated last year
- β31Dec 18, 2025Updated 3 months ago
- Code for LifelongMemory: Leveraging LLMs for Answering Queries in Long-form Egocentric Videosβ28Oct 27, 2025Updated 4 months ago
- β18Dec 17, 2022Updated 3 years ago
- This is a python library. Install with "python3 -m pip install rp" then run with "python3 -m rp" or just "rp". Requires pythonβ₯3.5β13Mar 17, 2026Updated last week
- Code for the paper Seeing the Pose in the Pixels: Learning Pose-Aware Representations in Vision Transformersβ21Aug 2, 2024Updated last year
- Code for NeurIPS 2022 paper "Learning Viewpoint-Agnostic Visual Representations by Recovering Tokens in 3D Space"β20Apr 20, 2023Updated 2 years ago
- Fast Vision Mamba : Pool your Spatial Dimensions for Accelerated Processingβ17Jan 28, 2025Updated last year
- GPU virtual machines on DigitalOcean Gradient AI β’ AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- [ICLR'25] LLaRA: Supercharging Robot Learning Data for Vision-Language Policyβ226Mar 29, 2025Updated 11 months ago
- This repository contains the implementation for our work "TopoDiffusionNet: A Topology-aware Diffusion Model", accepted to ICLR 2025.β22Apr 17, 2025Updated 11 months ago
- Official repository for "Self-Supervised Video Transformer" (CVPR'22)β108Jun 26, 2024Updated last year
- Environments for Active Vision Reinforcement Learningβ29Oct 10, 2024Updated last year
- Official implementation for "A Simple LLM Framework for Long-Range Video Question-Answering"β106Oct 27, 2024Updated last year
- (NeurIPS 2024 Spotlight) TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignmentβ29Sep 27, 2024Updated last year
- Theia: Distilling Diverse Vision Foundation Models for Robot Learningβ271Nov 6, 2025Updated 4 months ago
- Video + CLIP Baseline for Ego4D Long Term Action Anticipation Challenge (CVPR 2022)β15Jul 4, 2022Updated 3 years ago
- [ECCV 2024] Code for Betrayed by Attention: A Simple yet Effective Approach for Self-supervised Video Object Segmentationβ34Mar 7, 2025Updated last year
- Virtual machines for every use case on DigitalOcean β’ AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Pytorch implementation of HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Modelsβ28Mar 22, 2024Updated 2 years ago
- Code for CVPR25 paper "VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos"β158Jun 23, 2025Updated 9 months ago
- Polygonal simulation environment and implementation of the STAA*, DWA, and a PD controller for nonholonomic agents.β18Oct 11, 2024Updated last year
- β27May 13, 2025Updated 10 months ago
- The official repository of our paper "Reinforcing Video Reasoning with Focused Thinking"β35Jun 12, 2025Updated 9 months ago
- β13Aug 7, 2025Updated 7 months ago
- Uncertainty-Guided Pseudo-Labelling with Model Averagingβ11Mar 17, 2026Updated last week
- (2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understandingβ348Jul 19, 2024Updated last year
- This is the official code of VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding (ECCV 2024)β304Dec 5, 2024Updated last year
- Open source password manager - Proton Pass β’ AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- [ECCV 2022] Learning Instance-Specific Adaptation for Cross-Domain Segmentationβ14Jul 17, 2022Updated 3 years ago
- [CVPR 2025] Official PyTorch code of "Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation".β55May 25, 2025Updated 10 months ago
- CLiC: Concept Learning in Contextβ10Jan 24, 2025Updated last year
- This is the official impletations of the EMNLP Findings paper, VideoINSTA: Zero-shot Long-Form Video Understanding via Informative Spatiaβ¦β25Nov 15, 2024Updated last year
- [CVPR 2025] PyTorch implementation of T-CORE, introduced in "When the Future Becomes the Past: Taming Temporal Correspondence for Self-suβ¦β17Nov 4, 2025Updated 4 months ago
- β20Mar 10, 2025Updated last year
- [ICML 2025] This is the official PyTorch implementation of "OmniBal: Towards Fast Instruction-Tuning for Vision-Language Models via Omnivβ¦β27Jun 16, 2025Updated 9 months ago