π€ [ICLR'25] Multimodal Video Understanding Framework (MVU)
β56Jan 31, 2025Updated last year
Alternatives and similar repositories for mvu
Users that are interested in mvu are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Code for our ACL 2025 paper "Language Repository for Long Video Understanding"β36Jun 17, 2024Updated last year
- [WIP] Code for LangToMoβ21Mar 19, 2026Updated 2 months ago
- Perceptual Grouping in Contrastive Vision-Language Models (ICCV'23)β37Jan 1, 2024Updated 2 years ago
- [Main Conference @ EACL'26] [Workshop @ NeurIPS'24] ποΈ LVNet.β43Feb 10, 2026Updated 4 months ago
- [ECCV 2024] Official Implementation of CoPT: Unsupervised Domain Adaptive Segmentation using Domain-Agnostic Text Embeddingsβ11Feb 24, 2025Updated last year
- Deploy on Railway without the complexity - Free Credits Offer β’ AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- An unofficial pytorch dataloader for Open X-Embodiment Datasets https://github.com/google-deepmind/open_x_embodimentβ25Jan 9, 2025Updated last year
- [ICRA'24] Crossway Diffusion: Improving Diffusion-based Visuomotor Policy via Self-supervised Learningβ72Aug 4, 2024Updated last year
- Code for NeurIPS 2023 paper "Active Vision Reinforcement Learning with Limited Visual Observability"β56Oct 10, 2024Updated last year
- Code for LifelongMemory: Leveraging LLMs for Answering Queries in Long-form Egocentric Videosβ30Oct 27, 2025Updated 7 months ago
- β18Dec 17, 2022Updated 3 years ago
- [ICLR'25] LLaRA: Supercharging Robot Learning Data for Vision-Language Policyβ229Mar 29, 2025Updated last year
- Official repository for "Self-Supervised Video Transformer" (CVPR'22)β109Jun 26, 2024Updated last year
- WACV 2024: "PathLDM: Text conditioned Latent Diffusion Model for Histopathology"β51Jul 7, 2024Updated last year
- [NeurIPS25 D&B Spotlight] A tile-level histopathology image understanding benchmarkβ47Updated this week
- AI Agents on DigitalOcean Gradient AI Platform β’ AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- This repository contains the implementation for our work "TopoDiffusionNet: A Topology-aware Diffusion Model", accepted to ICLR 2025.β26Apr 17, 2025Updated last year
- Official implementation for "A Simple LLM Framework for Long-Range Video Question-Answering"β106Oct 27, 2024Updated last year
- Theia: Distilling Diverse Vision Foundation Models for Robot Learningβ276Nov 6, 2025Updated 7 months ago
- (NeurIPS 2024 Spotlight) TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignmentβ29Sep 27, 2024Updated last year
- The official implementation of "Semi-supervised Segmentation of Histopathology Images with Noise-Aware Topological Consistency".β13Jul 16, 2024Updated last year
- [ECCV 2024] Code for Betrayed by Attention: A Simple yet Effective Approach for Self-supervised Video Object Segmentationβ34Mar 7, 2025Updated last year
- Code for our ICCV 2025 paper "Adaptive Caching for Faster Video Generation with Diffusion Transformers"β171Nov 5, 2024Updated last year
- Code for CVPR25 paper "VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos"β164Jun 23, 2025Updated 11 months ago
- Official implementation of PARIS3D (Accepted to ECCV 2024).β27Sep 25, 2024Updated last year
- Open source password manager - Proton Pass β’ AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- [NeurIPS 2025] Code for BEAST Experiments on CALVIN and LIBERO.β38Jan 8, 2026Updated 5 months ago
- The official repository of our paper "Reinforcing Video Reasoning with Focused Thinking"β36Jun 12, 2025Updated last year
- β29May 13, 2025Updated last year
- β13Apr 13, 2026Updated 2 months ago
- Uncertainty-Guided Pseudo-Labelling with Model Averagingβ11Mar 17, 2026Updated 2 months ago
- (2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understandingβ350Jul 19, 2024Updated last year
- This is the official code of VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding (ECCV 2024)β315Dec 5, 2024Updated last year
- β28Mar 20, 2023Updated 3 years ago
- [ECCV 2022] Learning Instance-Specific Adaptation for Cross-Domain Segmentationβ14Jul 17, 2022Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- [CVPR 2025] Official PyTorch code of "Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation".β57May 25, 2025Updated last year
- This is the official impletations of the EMNLP Findings paper, VideoINSTA: Zero-shot Long-Form Video Understanding via Informative Spatiaβ¦β25Apr 7, 2026Updated 2 months ago
- [CVPR 2025] PyTorch implementation of T-CORE, introduced in "When the Future Becomes the Past: Taming Temporal Correspondence for Self-suβ¦β19Nov 4, 2025Updated 7 months ago
- Official repository for "Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition" [ICCV 2023]β102Apr 30, 2024Updated 2 years ago
- β108Jul 30, 2024Updated last year
- [ICML 2025] This is the official PyTorch implementation of "OmniBal: Towards Fast Instruction-Tuning for Vision-Language Models via Omnivβ¦β27Jun 16, 2025Updated 11 months ago
- Source code for the paper "Do Deep Neural Network Solutions form a Star Domain?"β12May 26, 2024Updated 2 years ago