LJungang / SAVEn-Vid
SAVEn-Vid: Synergistic Audio-Visual Integration for Enhanced Understanding in Long Video Context
β5Updated 2 months ago
Alternatives and similar repositories for SAVEn-Vid:
Users that are interested in SAVEn-Vid are comparing it to the libraries listed below
- π See How Top MLLMs Understand Video Compositions.β18Updated 2 months ago
- [Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.β85Updated 7 months ago
- β87Updated 2 months ago
- Unified Audio-Visual Perception for Multi-Task Video Localizationβ24Updated 10 months ago
- Official repository of NeurIPS D&B Track 2024 paper "VERIFIED: A Video Corpus Moment Retrieval Benchmark for Fine-Grained Video Understanβ¦β32Updated last month
- Video Chain of Thought, Codes for ICML 2024 paper: "Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition"β100Updated this week
- A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Abilityβ85Updated 3 months ago
- [ICLR 2025] TRACE: Temporal Grounding Video LLM via Casual Event Modelingβ70Updated last month
- [CVPR 2024] Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detectionβ85Updated 7 months ago
- [CVPR'25] VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selectionβ51Updated this week
- [AAAI 2025] VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Groundingβ87Updated 2 months ago
- The official repository for paper "PruneVid: Visual Token Pruning for Efficient Video Large Language Models".β30Updated 2 weeks ago
- [CVPR 2024] Context-Guided Spatio-Temporal Video Groundingβ49Updated 8 months ago
- [ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, β¦β105Updated last week
- Official implementation of paper ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understandingβ25Updated this week
- β23Updated 5 months ago
- VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMsβ39Updated this week
- Official implementation of HawkEye: Training Video-Text LLMs for Grounding Text in Videosβ37Updated 10 months ago
- Official PyTorch code of "Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation".β18Updated 2 weeks ago
- The repository contains the official implementation of "Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentation"β33Updated 3 months ago
- β25Updated 4 months ago
- Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Modelsβ84Updated 2 months ago
- β48Updated last week
- [CVPR2025] Number it: Temporal Grounding Videos like Flipping Mangaβ56Updated 3 months ago
- β89Updated 7 months ago
- Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)β62Updated last year
- Accepted by CVPR 2024β32Updated 9 months ago
- [ECCV2024] Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Modelsβ15Updated 7 months ago