π€ [ICLR'25] Multimodal Video Understanding Framework (MVU)
β56Jan 31, 2025Updated last year
Alternatives and similar repositories for mvu
Users that are interested in mvu are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Code for our ACL 2025 paper "Language Repository for Long Video Understanding"β36Jun 17, 2024Updated last year
- [WIP] Code for LangToMoβ21Mar 19, 2026Updated 2 months ago
- [Main Conference @ EACL'26] [Workshop @ NeurIPS'24] ποΈ LVNet.β43Feb 10, 2026Updated 3 months ago
- An unofficial pytorch dataloader for Open X-Embodiment Datasets https://github.com/google-deepmind/open_x_embodimentβ25Jan 9, 2025Updated last year
- [ICRA'24] Crossway Diffusion: Improving Diffusion-based Visuomotor Policy via Self-supervised Learningβ72Aug 4, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer β’ AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- β38Dec 18, 2025Updated 5 months ago
- β14Jun 25, 2022Updated 3 years ago
- Code for LifelongMemory: Leveraging LLMs for Answering Queries in Long-form Egocentric Videosβ30Oct 27, 2025Updated 6 months ago
- β18Dec 17, 2022Updated 3 years ago
- Code for NeurIPS 2022 paper "Learning Viewpoint-Agnostic Visual Representations by Recovering Tokens in 3D Space"β20Apr 20, 2023Updated 3 years ago
- Fast Vision Mamba : Pool your Spatial Dimensions for Accelerated Processingβ19Jan 28, 2025Updated last year
- [ICLR'25] LLaRA: Supercharging Robot Learning Data for Vision-Language Policyβ229Mar 29, 2025Updated last year
- Official repository for "Self-Supervised Video Transformer" (CVPR'22)β108Jun 26, 2024Updated last year
- WACV 2024: "PathLDM: Text conditioned Latent Diffusion Model for Histopathology"β50Jul 7, 2024Updated last year
- Virtual machines for every use case on DigitalOcean β’ AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Environments for Active Vision Reinforcement Learningβ29Oct 10, 2024Updated last year
- [NeurIPS25 D&B Spotlight] A tile-level histopathology image understanding benchmarkβ46May 7, 2026Updated 2 weeks ago
- This repository contains the implementation for our work "TopoDiffusionNet: A Topology-aware Diffusion Model", accepted to ICLR 2025.β25Apr 17, 2025Updated last year
- Official implementation for "A Simple LLM Framework for Long-Range Video Question-Answering"β105Oct 27, 2024Updated last year
- Theia: Distilling Diverse Vision Foundation Models for Robot Learningβ276Nov 6, 2025Updated 6 months ago
- (NeurIPS 2024 Spotlight) TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignmentβ29Sep 27, 2024Updated last year
- [ECCV 2024] Code for Betrayed by Attention: A Simple yet Effective Approach for Self-supervised Video Object Segmentationβ34Mar 7, 2025Updated last year
- CVPR 2024: Learned representation-guided diffusion models for large-image generationβ62Oct 8, 2024Updated last year
- Pytorch implementation of HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Modelsβ28Mar 22, 2024Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Code for our ICCV 2025 paper "Adaptive Caching for Faster Video Generation with Diffusion Transformers"β171Nov 5, 2024Updated last year
- Code for CVPR25 paper "VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos"β160Jun 23, 2025Updated 11 months ago
- [ECCV2022] [T-PAMI] StARformer: Transformer with State-Action-Reward Representations.β97May 21, 2023Updated 3 years ago
- Official implementation of PARIS3D (Accepted to ECCV 2024).β27Sep 25, 2024Updated last year
- [NeurIPS 2025] Code for BEAST Experiments on CALVIN and LIBERO.β36Jan 8, 2026Updated 4 months ago
- The official repository of our paper "Reinforcing Video Reasoning with Focused Thinking"β35Jun 12, 2025Updated 11 months ago
- β13Apr 13, 2026Updated last month
- Uncertainty-Guided Pseudo-Labelling with Model Averagingβ11Mar 17, 2026Updated 2 months ago
- (2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understandingβ348Jul 19, 2024Updated last year
- Deploy open-source AI quickly and easily - Special Bonus Offer β’ AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- This is the official code of VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding (ECCV 2024)β311Dec 5, 2024Updated last year
- β28Mar 20, 2023Updated 3 years ago
- [ECCV 2022] Learning Instance-Specific Adaptation for Cross-Domain Segmentationβ14Jul 17, 2022Updated 3 years ago
- Official Repository of "Fibottention: Inceptive Visual Representation Learning with Diverse Attention Across Heads"β17Oct 6, 2025Updated 7 months ago
- Learnable Weight Initialization for Volumetric Medical Image Segmentation [Elsevier AIM2024]β22Oct 27, 2024Updated last year
- [CVPR 2025] Official PyTorch code of "Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation".β56May 25, 2025Updated 11 months ago
- This is the official impletations of the EMNLP Findings paper, VideoINSTA: Zero-shot Long-Form Video Understanding via Informative Spatiaβ¦β25Apr 7, 2026Updated last month