π€ [ICLR'25] Multimodal Video Understanding Framework (MVU)
β56Jan 31, 2025Updated last year
Alternatives and similar repositories for mvu
Users that are interested in mvu are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Code for our ACL 2025 paper "Language Repository for Long Video Understanding"β36Jun 17, 2024Updated last year
- [WIP] Code for LangToMoβ20Mar 19, 2026Updated 3 weeks ago
- Perceptual Grouping in Contrastive Vision-Language Models (ICCV'23)β37Jan 1, 2024Updated 2 years ago
- [Main Conference @ EACL'26] [Workshop @ NeurIPS'24] ποΈ LVNet.β43Feb 10, 2026Updated 2 months ago
- [ECCV 2024] Official Implementation of CoPT: Unsupervised Domain Adaptive Segmentation using Domain-Agnostic Text Embeddingsβ11Feb 24, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- An unofficial pytorch dataloader for Open X-Embodiment Datasets https://github.com/google-deepmind/open_x_embodimentβ25Jan 9, 2025Updated last year
- [ICRA'24] Crossway Diffusion: Improving Diffusion-based Visuomotor Policy via Self-supervised Learningβ71Aug 4, 2024Updated last year
- β34Dec 18, 2025Updated 3 months ago
- β14Jun 25, 2022Updated 3 years ago
- Code for LifelongMemory: Leveraging LLMs for Answering Queries in Long-form Egocentric Videosβ28Oct 27, 2025Updated 5 months ago
- β18Dec 17, 2022Updated 3 years ago
- This is a python library. Install with "python3 -m pip install rp" then run with "python3 -m rp" or just "rp". Requires pythonβ₯3.5β13Mar 17, 2026Updated 3 weeks ago
- Fast Vision Mamba : Pool your Spatial Dimensions for Accelerated Processingβ18Jan 28, 2025Updated last year
- [ICLR'25] LLaRA: Supercharging Robot Learning Data for Vision-Language Policyβ227Mar 29, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- This repository contains the implementation for our work "TopoDiffusionNet: A Topology-aware Diffusion Model", accepted to ICLR 2025.β22Apr 17, 2025Updated 11 months ago
- WACV 2024: "PathLDM: Text conditioned Latent Diffusion Model for Histopathology"β49Jul 7, 2024Updated last year
- Environments for Active Vision Reinforcement Learningβ29Oct 10, 2024Updated last year
- [NeurIPS25 D&B Spotlight] A tile-level histopathology image understanding benchmarkβ45Apr 3, 2026Updated last week
- Official implementation for "A Simple LLM Framework for Long-Range Video Question-Answering"β105Oct 27, 2024Updated last year
- Video + CLIP Baseline for Ego4D Long Term Action Anticipation Challenge (CVPR 2022)β15Jul 4, 2022Updated 3 years ago
- The official implementation of "Semi-supervised Segmentation of Histopathology Images with Noise-Aware Topological Consistency".β13Jul 16, 2024Updated last year
- [ECCV 2024] Code for Betrayed by Attention: A Simple yet Effective Approach for Self-supervised Video Object Segmentationβ34Mar 7, 2025Updated last year
- CVPR 2024: Learned representation-guided diffusion models for large-image generationβ62Oct 8, 2024Updated last year
- Virtual machines for every use case on DigitalOcean β’ AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Pytorch implementation of HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Modelsβ28Mar 22, 2024Updated 2 years ago
- Code for our ICCV 2025 paper "Adaptive Caching for Faster Video Generation with Diffusion Transformers"β171Nov 5, 2024Updated last year
- [IEEE TCSVT] Official Pytorch Implementation of CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance Segmentation.β46Jan 7, 2025Updated last year
- Code for CVPR25 paper "VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos"β159Jun 23, 2025Updated 9 months ago
- Official implementation of PARIS3D (Accepted to ECCV 2024).β27Sep 25, 2024Updated last year
- Pytorch Implementation of the Model from "MIRASOL3B: A MULTIMODAL AUTOREGRESSIVE MODEL FOR TIME-ALIGNED AND CONTEXTUAL MODALITIES"β26Jan 27, 2025Updated last year
- β27May 13, 2025Updated 11 months ago
- The official repository of our paper "Reinforcing Video Reasoning with Focused Thinking"β35Jun 12, 2025Updated 10 months ago
- β13Updated this week
- Bare Metal GPUs on DigitalOcean Gradient AI β’ AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Uncertainty-Guided Pseudo-Labelling with Model Averagingβ11Mar 17, 2026Updated 3 weeks ago
- (2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understandingβ350Jul 19, 2024Updated last year
- This is the official code of VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding (ECCV 2024)β304Dec 5, 2024Updated last year
- [CVPR 2025] Official PyTorch code of "Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation".β55May 25, 2025Updated 10 months ago
- Official Repository of "Fibottention: Inceptive Visual Representation Learning with Diverse Attention Across Heads"β17Oct 6, 2025Updated 6 months ago
- Learnable Weight Initialization for Volumetric Medical Image Segmentation [Elsevier AIM2024]β22Oct 27, 2024Updated last year
- This is the official impletations of the EMNLP Findings paper, VideoINSTA: Zero-shot Long-Form Video Understanding via Informative Spatiaβ¦β25Apr 7, 2026Updated last week