LJungang / SAVEn-VidLinks
SAVEn-Vid: Synergistic Audio-Visual Integration for Enhanced Understanding in Long Video Context
☆5Updated 6 months ago
Alternatives and similar repositories for SAVEn-Vid
Users that are interested in SAVEn-Vid are comparing it to the libraries listed below
Sorting:
- This is a repository contains the implementation of our NeurIPS'24 paper "Temporal Sentence Grounding with Relevance Feedback in Videos"☆10Updated 7 months ago
- ☆16Updated 6 months ago
- ☆11Updated 3 months ago
- Code for the paper "ShowHowTo: Generating Scene-Conditioned Step-by-Step Visual Instructions" published at CVPR 2025☆15Updated 4 months ago
- ☆8Updated 5 months ago
- LLaVA-MR: Large Language-and-Vision Assistant for Video Moment Retrieval☆8Updated 7 months ago
- 本项目主要是2025届浙江大学软件学院夏令营(AI营)的考核项目☆11Updated 4 months ago
- [EMNLP 2024] TraveLER: A Modular Multi-LMM Agent Framework for Video Question-Answering☆15Updated 8 months ago
- Benchmarking for Audio-Text and Audio-Visual Generation; Supports FAD, FD_VGG, FD_PANNs, FD_PaSST, IS_PaSST, IS_PANNs, KL_PaSST, KL_PANNs…☆21Updated 4 months ago
- [ICLR 2025] Causal Graphical Models for Vision-Language Compositional Understanding☆9Updated 3 months ago
- [ACMMM 2024] Consistent123: One Image to Highly Consistent 3D Asset Using Case-Aware Diffusion Priors☆23Updated 8 months ago
- Renderer for the Crello dataset☆9Updated 5 months ago
- This is the official pytorch implementation for paper: Filter, Correlate, Compress: Training-Free Token Reduction for MLLM Acceleration☆15Updated 3 months ago
- [NeurIPS 2024] Mixture of Experts for Audio-Visual Learning☆15Updated 5 months ago
- ☆14Updated 7 months ago
- ☆8Updated 7 months ago
- ☆9Updated 6 months ago
- ☆19Updated 6 months ago
- [CVPR 2025] Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering☆36Updated 3 months ago
- A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability☆94Updated 7 months ago
- [⭐️ WACV 2025 Oral ⭐️] PETALface: Parameter Efficient Transfer Learning for Low-resolution Face Recognition☆13Updated last month
- KV cache compression via sparse coding☆11Updated 2 months ago
- Official Pytorch Implementation of "Cross-Attention Head Position Patterns Can Align with Human Visual Concepts in Text-to-Image Generati…☆9Updated 7 months ago
- Official repository of "Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning"☆32Updated 4 months ago
- [ICLR 2025] TRACE: Temporal Grounding Video LLM via Casual Event Modeling☆104Updated 5 months ago
- ☆22Updated 2 weeks ago
- [CVPR 2025] DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models☆58Updated 2 weeks ago
- ☆16Updated 2 months ago
- Papers of "A Survey on Large Multi-Modal Models from the Perspective of Input-Output Space Extension"☆10Updated 7 months ago
- Adapt MLLMs to Domains via Post-Training☆9Updated 6 months ago