π€ [ICLR'25] Multimodal Video Understanding Framework (MVU)
β55Jan 31, 2025Updated last year
Alternatives and similar repositories for mvu
Users that are interested in mvu are comparing it to the libraries listed below
Sorting:
- Code for our ACL 2025 paper "Language Repository for Long Video Understanding"β34Jun 17, 2024Updated last year
- [WIP] Code for LangToMoβ20Jun 25, 2025Updated 8 months ago
- [Main Conference @ EACL'26] [Workshop @ NeurIPS'24] ποΈ LVNet.β42Feb 10, 2026Updated 3 weeks ago
- Perceptual Grouping in Contrastive Vision-Language Models (ICCV'23)β37Jan 1, 2024Updated 2 years ago
- [ECCV 2024] Official Implementation of CoPT: Unsupervised Domain Adaptive Segmentation using Domain-Agnostic Text Embeddingsβ11Feb 24, 2025Updated last year
- An unofficial pytorch dataloader for Open X-Embodiment Datasets https://github.com/google-deepmind/open_x_embodimentβ24Jan 9, 2025Updated last year
- Code for LifelongMemory: Leveraging LLMs for Answering Queries in Long-form Egocentric Videosβ28Oct 27, 2025Updated 4 months ago
- Code for NeurIPS 2023 paper "Active Vision Reinforcement Learning with Limited Visual Observability"β54Oct 10, 2024Updated last year
- Code for NeurIPS 2022 paper "Learning Viewpoint-Agnostic Visual Representations by Recovering Tokens in 3D Space"β20Apr 20, 2023Updated 2 years ago
- This is a python library. Install with "python3 -m pip install rp" then run with "python3 -m rp" or just "rp". Requires pythonβ₯3.5β13Feb 16, 2026Updated 2 weeks ago
- This repository contains the implementation for our work "TopoDiffusionNet: A Topology-aware Diffusion Model", accepted to ICLR 2025.β21Apr 17, 2025Updated 10 months ago
- Environments for Active Vision Reinforcement Learningβ28Oct 10, 2024Updated last year
- Pytorch implementation of HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Modelsβ28Mar 22, 2024Updated last year
- β30Dec 18, 2025Updated 2 months ago
- β14Jun 25, 2022Updated 3 years ago
- WACV 2024: "PathLDM: Text conditioned Latent Diffusion Model for Histopathology"β48Jul 7, 2024Updated last year
- [ICLR'25] LLaRA: Supercharging Robot Learning Data for Vision-Language Policyβ227Mar 29, 2025Updated 11 months ago
- Video + CLIP Baseline for Ego4D Long Term Action Anticipation Challenge (CVPR 2022)β15Jul 4, 2022Updated 3 years ago
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.β69May 31, 2024Updated last year
- "FusionFactory: Fusing LLM Capabilities with Routing Data", Tao Feng, Haozhen Zhang, Zijie Lei, Pengrui Han, Mostofa Patwary, Mohammad Shβ¦β19Dec 30, 2025Updated 2 months ago
- Web-grounded natural language instructionsβ18Nov 25, 2024Updated last year
- (2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understandingβ347Jul 19, 2024Updated last year
- β24May 13, 2025Updated 9 months ago
- Polygonal simulation environment and implementation of the STAA*, DWA, and a PD controller for nonholonomic agents.β18Oct 11, 2024Updated last year
- [CVPR 2025] Official PyTorch code of "Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation".β54May 25, 2025Updated 9 months ago
- [ACL2025 Findings] Benchmarking Multihop Multimodal Internet Agentsβ48Feb 27, 2025Updated last year
- MCPL: MULTI-CONCEPT PROMPT LEARNINGβ20May 27, 2024Updated last year
- Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"β27Jan 17, 2026Updated last month
- This is the official code of VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding (ECCV 2024)β298Dec 5, 2024Updated last year
- This project allows you to plug in a GitHub repository URL, generate vectors for a LLM and use ChatGPT models to interact. The main frameβ¦β19Jun 4, 2023Updated 2 years ago
- LLMs prompt augmentation with RAG by integrating external custom data from a variety of sources, allowing chat with such documentsβ20Jul 22, 2024Updated last year
- Code repository for the paper - "Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass"β21Aug 22, 2024Updated last year
- Code for CVPR25 paper "VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos"β154Jun 23, 2025Updated 8 months ago
- Pytorch implementation for Egoinstructor at CVPR 2024β28Dec 1, 2024Updated last year
- This is the offical repository of LLAVIDALβ23Oct 4, 2025Updated 5 months ago
- CVPR 2024: Learned representation-guided diffusion models for large-image generationβ60Oct 8, 2024Updated last year
- Official code repo of PIN: Positional Insert Unlocks Object Localisation Abilities in VLMsβ26Jan 14, 2025Updated last year
- β27Apr 11, 2025Updated 10 months ago
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"β64Oct 19, 2024Updated last year