GenjiB / ECLIPSE
☆31Updated last year
Related projects ⓘ
Alternatives and complementary repositories for ECLIPSE
- Hierarchical Video-Moment Retrieval and Step-Captioning (CVPR 2023)☆95Updated last year
- Official PyTorch implementation of the paper "Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge Transferring"☆98Updated 9 months ago
- The 1st place solution of 2022 Ego4d Natural Language Queries.☆32Updated 2 years ago
- ☆102Updated last year
- ACM Multimedia 2023 (Oral) - RTQ: Rethinking Video-language Understanding Based on Image-text Model☆15Updated 9 months ago
- [CVPR'23 Highlight] AutoAD: Movie Description in Context.☆88Updated 2 weeks ago
- Code for CVPR2023 paper "Collaborative Noisy Label Cleaner: Learning Scene-aware Trailers for Multi-modal Highlight Detection in Movies"☆18Updated last year
- Source code of our MM'22 paper Partially Relevant Video Retrieval☆51Updated 2 weeks ago
- This repository contains the code for our CVPR 2022 paper on "Audio-visual Generalised Zero-shot Learning with Cross-modal Attention and …☆34Updated last year
- [ACCV 2024] Official Implementation of "AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description". Junyu Xie, Tengda Han, M…☆17Updated last month
- This repo contains source code for Glance and Focus: Memory Prompting for Multi-Event Video Question Answering (Accepted in NeurIPS 2023)☆21Updated 4 months ago
- MUSIC-AVQA, CVPR2022 (ORAL)☆67Updated last year
- ☆50Updated last year
- https://layer6ai-labs.github.io/xpool/☆116Updated last year
- "Video Moment Retrieval from Text Queries via Single Frame Annotation" in SIGIR 2022.☆65Updated 2 years ago
- Pytorch Code for "Unified Coarse-to-Fine Alignment for Video-Text Retrieval" (ICCV 2023)☆61Updated 5 months ago
- Codebase for the paper: "TIM: A Time Interval Machine for Audio-Visual Action Recognition"☆37Updated 2 weeks ago
- ☆31Updated 3 years ago
- Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing, ECCV, 2020. (Spotlight)☆80Updated 3 months ago
- Vision Transformers are Parameter-Efficient Audio-Visual Learners☆89Updated last year
- [SIGIR 2022] CenterCLIP: Token Clustering for Efficient Text-Video Retrieval. Also, a text-video retrieval toolbox based on CLIP + fast p…☆126Updated 2 years ago
- ☆21Updated 11 months ago
- This is an official pytorch implementation of Learning To Recognize Procedural Activities with Distant Supervision. In this repository, w…☆40Updated last year
- [ICCV 2023] Accurate and Fast Compressed Video Captioning☆34Updated 9 months ago
- ☆31Updated 8 months ago
- [ Arxiv 2023 ] This repository contains the code for "MUPPET: Multi-Modal Few-Shot Temporal Action Detection"☆14Updated last year
- [ICLR2024] The official implementation of paper "UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal Modeling", by …☆70Updated 9 months ago
- Codes and Models for COSA: Concatenated Sample Pretrained Vision-Language Foundation Model☆39Updated last year
- Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)☆61Updated 9 months ago
- ICCV2023: Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning☆39Updated last year