[Main Conference @ EACL'26] [Workshop @ NeurIPS'24] ποΈ LVNet.
β42Feb 10, 2026Updated 2 weeks ago
Alternatives and similar repositories for LVNet
Users that are interested in LVNet are comparing it to the libraries listed below
Sorting:
- [WIP] Code for LangToMoβ20Jun 25, 2025Updated 8 months ago
- π€ [ICLR'25] Multimodal Video Understanding Framework (MVU)β55Jan 31, 2025Updated last year
- Code for the paper Seeing the Pose in the Pixels: Learning Pose-Aware Representations in Vision Transformersβ21Aug 2, 2024Updated last year
- [ECCV 2024] Official Implementation of CoPT: Unsupervised Domain Adaptive Segmentation using Domain-Agnostic Text Embeddingsβ11Feb 24, 2025Updated last year
- An unofficial pytorch dataloader for Open X-Embodiment Datasets https://github.com/google-deepmind/open_x_embodimentβ23Jan 9, 2025Updated last year
- Code for LifelongMemory: Leveraging LLMs for Answering Queries in Long-form Egocentric Videosβ28Oct 27, 2025Updated 4 months ago
- β18Dec 17, 2022Updated 3 years ago
- Official Repository of "Fibottention: Inceptive Visual Representation Learning with Diverse Attention Across Heads"β17Oct 6, 2025Updated 4 months ago
- Code for our ACL 2025 paper "Language Repository for Long Video Understanding"β34Jun 17, 2024Updated last year
- Perceptual Grouping in Contrastive Vision-Language Models (ICCV'23)β37Jan 1, 2024Updated 2 years ago
- This repository contains the implementation for our work "TopoDiffusionNet: A Topology-aware Diffusion Model", accepted to ICLR 2025.β21Apr 17, 2025Updated 10 months ago
- This is a python library. Install with "python3 -m pip install rp" then run with "python3 -m rp" or just "rp". Requires pythonβ₯3.5β13Feb 16, 2026Updated last week
- Environments for Active Vision Reinforcement Learningβ28Oct 10, 2024Updated last year
- β14Jun 25, 2022Updated 3 years ago
- β13Feb 26, 2024Updated 2 years ago
- [ICRA'24] Crossway Diffusion: Improving Diffusion-based Visuomotor Policy via Self-supervised Learningβ70Aug 4, 2024Updated last year
- (NeurIPS 2024 Spotlight) TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignmentβ29Sep 27, 2024Updated last year
- β¨β¨[AAAI 2026] This is the official implementation of our paper "QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Viβ¦β77Apr 28, 2025Updated 10 months ago
- Video + CLIP Baseline for Ego4D Long Term Action Anticipation Challenge (CVPR 2022)β15Jul 4, 2022Updated 3 years ago
- TT-SPN: Twin Transformers with Sinusoidal Representation Networks for Video Instance Segmentationβ16Oct 8, 2021Updated 4 years ago
- [ACL 2025 Findings] Official pytorch implementation of "Don't Miss the Forest for the Trees: Attentional Vision Calibration for Large Visβ¦β24Jul 21, 2024Updated last year
- This repository contains code for AAAI2025 paper "Dense Audio-Visual Event Localization under Cross-Modal Consistency and Multi-Temporal β¦β23Aug 18, 2025Updated 6 months ago
- [EMNLPβ24 Main] Encoding and Controlling Global Semantics for Long-form Video Question Answeringβ18Oct 9, 2024Updated last year
- Official implementation of HawkEye: Training Video-Text LLMs for Grounding Text in Videosβ46Apr 29, 2024Updated last year
- Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)β83Jul 1, 2024Updated last year
- Code for CVPR25 paper "VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos"β154Jun 23, 2025Updated 8 months ago
- Pose driven attention mechanismβ45Mar 31, 2022Updated 3 years ago
- Code for NeurIPS 2023 paper "Active Vision Reinforcement Learning with Limited Visual Observability"β54Oct 10, 2024Updated last year
- [ICLR'25] LLaRA: Supercharging Robot Learning Data for Vision-Language Policyβ227Mar 29, 2025Updated 11 months ago
- [CVPR 2026] TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMsβ103Feb 22, 2026Updated last week
- β135Apr 16, 2025Updated 10 months ago
- β32Jul 29, 2024Updated last year
- [AAAI 2025] VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Groundingβ126Dec 10, 2024Updated last year
- Official repository for "Self-Supervised Video Transformer" (CVPR'22)β108Jun 26, 2024Updated last year
- Planβ is a platform for creating and publishing digital planning servicesβ17Updated this week
- [EMNLP 2024] A Video Chat Agent with Temporal Priorβ32Mar 2, 2025Updated 11 months ago
- β18Jun 10, 2025Updated 8 months ago
- [EMNLP 2025 Findings] Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Modelsβ138Aug 21, 2025Updated 6 months ago
- Latest Papers, Codes and Datasets on VTG-LLMs.β81Nov 17, 2025Updated 3 months ago