[Main Conference @ EACL'26] [Workshop @ NeurIPS'24] ποΈ LVNet.
β43Feb 10, 2026Updated 4 months ago
Alternatives and similar repositories for LVNet
Users that are interested in LVNet are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [WIP] Code for LangToMoβ21Mar 19, 2026Updated 2 months ago
- π€ [ICLR'25] Multimodal Video Understanding Framework (MVU)β56Jan 31, 2025Updated last year
- Code for the paper Seeing the Pose in the Pixels: Learning Pose-Aware Representations in Vision Transformersβ21Aug 2, 2024Updated last year
- [ECCV 2024] Official Implementation of CoPT: Unsupervised Domain Adaptive Segmentation using Domain-Agnostic Text Embeddingsβ11Feb 24, 2025Updated last year
- An unofficial pytorch dataloader for Open X-Embodiment Datasets https://github.com/google-deepmind/open_x_embodimentβ25Jan 9, 2025Updated last year
- Simple, predictable pricing with DigitalOcean hosting β’ AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Code for LifelongMemory: Leveraging LLMs for Answering Queries in Long-form Egocentric Videosβ30Oct 27, 2025Updated 7 months ago
- β14Feb 26, 2024Updated 2 years ago
- This is the offical repository of LLAVIDALβ24Oct 4, 2025Updated 8 months ago
- (NeurIPS 2024 Spotlight) TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignmentβ29Sep 27, 2024Updated last year
- β14Jun 25, 2022Updated 3 years ago
- Official Repository of "Fibottention: Inceptive Visual Representation Learning with Diverse Attention Across Heads"β17Oct 6, 2025Updated 8 months ago
- Code for NeurIPS 2022 paper "Learning Viewpoint-Agnostic Visual Representations by Recovering Tokens in 3D Space"β20Apr 20, 2023Updated 3 years ago
- Official implementation for "A Simple LLM Framework for Long-Range Video Question-Answering"β106Oct 27, 2024Updated last year
- [ACL 2026] Paper list of Video LLM hallucination. Welcome to Star and Contribute!β34Jun 1, 2026Updated last week
- AI Agents on DigitalOcean Gradient AI Platform β’ AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- [AAAI 2025] Grounded Multi-Hop VideoQA in Long-Form Egocentric Videosβ37May 27, 2025Updated last year
- [ICRA'24] Crossway Diffusion: Improving Diffusion-based Visuomotor Policy via Self-supervised Learningβ72Aug 4, 2024Updated last year
- Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal Models for Video Question Answering [ACM MM'24]β10Jul 22, 2024Updated last year
- This repository contains the implementation for our work "TopoDiffusionNet: A Topology-aware Diffusion Model", accepted to ICLR 2025.β26Apr 17, 2025Updated last year
- [EMNLPβ24 Main] Encoding and Controlling Global Semantics for Long-form Video Question Answeringβ18Oct 9, 2024Updated last year
- We introduce new approach, Token Reduction using CLIP Metric (TRIM), aimed at improving the efficiency of MLLMs without sacrificing theirβ¦β22Jan 11, 2026Updated 4 months ago
- This repository contains code for AAAI2025 paper "Dense Audio-Visual Event Localization under Cross-Modal Consistency and Multi-Temporal β¦β24Aug 18, 2025Updated 9 months ago
- [ICLR'25] LLaRA: Supercharging Robot Learning Data for Vision-Language Policyβ229Mar 29, 2025Updated last year
- β¨β¨[AAAI 2026] This is the official implementation of our paper "QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Viβ¦β78Apr 28, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Code for CVPR25 paper "VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos"β162Jun 23, 2025Updated 11 months ago
- [EMNLP 2024] A Video Chat Agent with Temporal Priorβ33Mar 2, 2025Updated last year
- [CVPR 2025] Beacon3D: Object-centric Evaluation for 3D Grounding-QAβ28Nov 25, 2025Updated 6 months ago
- β26Jun 5, 2025Updated last year
- β150Apr 16, 2025Updated last year
- Neural network methods for multimodal map reconstruction and their usage for robot navigation and controlβ16Jun 11, 2024Updated last year
- [NeurIPS 2025] Code for BEAST Experiments on CALVIN and LIBERO.β38Jan 8, 2026Updated 5 months ago
- [CVPR 2025] AIGV-Assessor: Benchmarking and Evaluating the Perceptual Quality of Text-to-Video Generation with LMMβ18Mar 19, 2026Updated 2 months ago
- LinVT: Empower Your Image-level Large Language Model to Understand Videosβ84Dec 30, 2024Updated last year
- Managed Kubernetes at scale on DigitalOcean β’ AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- [ACL 2025 Findings] Official pytorch implementation of "Don't Miss the Forest for the Trees: Attentional Vision Calibration for Large Visβ¦β25Jul 21, 2024Updated last year
- Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)β89Jul 1, 2024Updated last year
- Code for NeurIPS 2023 paper "Active Vision Reinforcement Learning with Limited Visual Observability"β56Oct 10, 2024Updated last year
- [NeurIPS 2023] Latent Graph Inference with Limited Supervisionβ33Feb 1, 2024Updated 2 years ago
- β13Apr 9, 2025Updated last year
- The official implementation of the paper "Rethinking Pruning for Vision-Language Models: Strategies for Effective Sparsity".β17Jul 2, 2024Updated last year
- [CVPR 2025] Official PyTorch code of "Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation".β57May 25, 2025Updated last year