π€ [ICLR'25] Multimodal Video Understanding Framework (MVU)
β56Jan 31, 2025Updated last year
Alternatives and similar repositories for mvu
Users that are interested in mvu are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Code for our ACL 2025 paper "Language Repository for Long Video Understanding"β36Jun 17, 2024Updated last year
- [WIP] Code for LangToMoβ21Mar 19, 2026Updated last month
- Perceptual Grouping in Contrastive Vision-Language Models (ICCV'23)β37Jan 1, 2024Updated 2 years ago
- [Main Conference @ EACL'26] [Workshop @ NeurIPS'24] ποΈ LVNet.β43Feb 10, 2026Updated 2 months ago
- [ECCV 2024] Official Implementation of CoPT: Unsupervised Domain Adaptive Segmentation using Domain-Agnostic Text Embeddingsβ11Feb 24, 2025Updated last year
- AI Agents on DigitalOcean Gradient AI Platform β’ AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- An unofficial pytorch dataloader for Open X-Embodiment Datasets https://github.com/google-deepmind/open_x_embodimentβ25Jan 9, 2025Updated last year
- [ICRA'24] Crossway Diffusion: Improving Diffusion-based Visuomotor Policy via Self-supervised Learningβ72Aug 4, 2024Updated last year
- Code for NeurIPS 2023 paper "Active Vision Reinforcement Learning with Limited Visual Observability"β54Oct 10, 2024Updated last year
- β36Dec 18, 2025Updated 4 months ago
- Code for LifelongMemory: Leveraging LLMs for Answering Queries in Long-form Egocentric Videosβ28Oct 27, 2025Updated 6 months ago
- β18Dec 17, 2022Updated 3 years ago
- This is a python library. Install with "python3 -m pip install rp" then run with "python3 -m rp" or just "rp". Requires pythonβ₯3.5β13Updated this week
- Code for the paper Seeing the Pose in the Pixels: Learning Pose-Aware Representations in Vision Transformersβ21Aug 2, 2024Updated last year
- Code for NeurIPS 2022 paper "Learning Viewpoint-Agnostic Visual Representations by Recovering Tokens in 3D Space"β20Apr 20, 2023Updated 3 years ago
- AI Agents on DigitalOcean Gradient AI Platform β’ AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Fast Vision Mamba : Pool your Spatial Dimensions for Accelerated Processingβ18Jan 28, 2025Updated last year
- Official repository for "Self-Supervised Video Transformer" (CVPR'22)β108Jun 26, 2024Updated last year
- WACV 2024: "PathLDM: Text conditioned Latent Diffusion Model for Histopathology"β49Jul 7, 2024Updated last year
- Environments for Active Vision Reinforcement Learningβ29Oct 10, 2024Updated last year
- [NeurIPS25 D&B Spotlight] A tile-level histopathology image understanding benchmarkβ45Apr 14, 2026Updated 2 weeks ago
- Official implementation for "A Simple LLM Framework for Long-Range Video Question-Answering"β105Oct 27, 2024Updated last year
- Video + CLIP Baseline for Ego4D Long Term Action Anticipation Challenge (CVPR 2022)β15Jul 4, 2022Updated 3 years ago
- (NeurIPS 2024 Spotlight) TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignmentβ29Sep 27, 2024Updated last year
- The official implementation of "Semi-supervised Segmentation of Histopathology Images with Noise-Aware Topological Consistency".β13Jul 16, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- [ECCV 2024] Code for Betrayed by Attention: A Simple yet Effective Approach for Self-supervised Video Object Segmentationβ34Mar 7, 2025Updated last year
- Pytorch implementation of HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Modelsβ28Mar 22, 2024Updated 2 years ago
- Code for our ICCV 2025 paper "Adaptive Caching for Faster Video Generation with Diffusion Transformers"β171Nov 5, 2024Updated last year
- [IEEE TCSVT] Official Pytorch Implementation of CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance Segmentation.β48Jan 7, 2025Updated last year
- Code for CVPR25 paper "VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos"β159Jun 23, 2025Updated 10 months ago
- [ECCV2022] [T-PAMI] StARformer: Transformer with State-Action-Reward Representations.β97May 21, 2023Updated 2 years ago
- [NeurIPS 2025] Code for BEAST Experiments on CALVIN and LIBERO.β35Jan 8, 2026Updated 3 months ago
- β17Nov 9, 2022Updated 3 years ago
- The official repository of our paper "Reinforcing Video Reasoning with Focused Thinking"β35Jun 12, 2025Updated 10 months ago
- Deploy on Railway without the complexity - Free Credits Offer β’ AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- β28May 13, 2025Updated 11 months ago
- β13Apr 13, 2026Updated 3 weeks ago
- (2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understandingβ348Jul 19, 2024Updated last year
- This is the official code of VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding (ECCV 2024)β307Dec 5, 2024Updated last year
- β27Mar 20, 2023Updated 3 years ago
- [ECCV 2022] Learning Instance-Specific Adaptation for Cross-Domain Segmentationβ14Jul 17, 2022Updated 3 years ago
- [CVPR 2025] Official PyTorch code of "Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation".β55May 25, 2025Updated 11 months ago