md-mohaiminul / ViS4mer
☆54Updated 2 years ago
Alternatives and similar repositories for ViS4mer:
Users that are interested in ViS4mer are comparing it to the libraries listed below
- ☆105Updated 2 years ago
- Official implementation of "Everything at Once - Multi-modal Fusion Transformer for Video Retrieval". CVPR 2022☆98Updated 2 years ago
- Hierarchical Video-Moment Retrieval and Step-Captioning (CVPR 2023)☆96Updated last week
- ☆73Updated last year
- ☆25Updated last year
- Codebase for the paper: "TIM: A Time Interval Machine for Audio-Visual Action Recognition"☆38Updated 2 months ago
- Official PyTorch implementation of the paper "Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge Transferring"☆99Updated last year
- ☆53Updated 2 years ago
- [CVPR'23 Highlight] AutoAD: Movie Description in Context.☆91Updated 2 months ago
- ☆31Updated 3 years ago
- [ICLR2024] The official implementation of paper "UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal Modeling", by …☆72Updated last year
- Official Code of ECCV 2022 paper MS-CLIP☆88Updated 2 years ago
- [CVPR'22 Oral] Temporal Alignment Networks for Long-term Video. Tengda Han, Weidi Xie, Andrew Zisserman.☆115Updated last year
- Official implementation of "HowToCaption: Prompting LLMs to Transform Video Annotations at Scale." ECCV 2024☆50Updated 3 months ago
- ☆30Updated 2 months ago
- [CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers☆175Updated last year
- This repository contains the code for our CVPR 2022 paper on "Audio-visual Generalised Zero-shot Learning with Cross-modal Attention and …☆35Updated 2 years ago
- ICCV2023: Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning☆39Updated last year
- Official repository for "Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting" [CVPR 2023]☆113Updated last year
- ☆192Updated 2 years ago
- ☆22Updated last year
- [ICCV 2023] Accurate and Fast Compressed Video Captioning☆36Updated 11 months ago
- Code for CVPR2023 paper "Collaborative Noisy Label Cleaner: Learning Scene-aware Trailers for Multi-modal Highlight Detection in Movies"☆17Updated last year
- A Unified Framework for Video-Language Understanding☆56Updated last year
- [ECCVW'24] Long-form Video Understanding by Bridging Episodic Memory and Semantic Knowledge☆22Updated 4 months ago
- [CVPR 2024] Do you remember? Dense Video Captioning with Cross-Modal Memory Retrieval☆48Updated 7 months ago
- MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions☆156Updated last year
- [arXiv:2309.16669] Code release for "Training a Large Video Model on a Single Machine in a Day"☆119Updated 5 months ago
- ☆23Updated 4 months ago
- ☆61Updated last year