md-mohaiminul / ViS4mer
☆54Updated 2 years ago
Alternatives and similar repositories for ViS4mer:
Users that are interested in ViS4mer are comparing it to the libraries listed below
- ☆26Updated last year
- [CVPR'22 Oral] Temporal Alignment Networks for Long-term Video. Tengda Han, Weidi Xie, Andrew Zisserman.☆115Updated last year
- A PyTorch implementation of EmpiricalMVM☆40Updated last year
- ☆107Updated 2 years ago
- Official PyTorch implementation of the paper "Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge Transferring"☆99Updated last year
- Code for CVPR2023 paper "Collaborative Noisy Label Cleaner: Learning Scene-aware Trailers for Multi-modal Highlight Detection in Movies"☆17Updated last year
- Official implementation of CVPR 2024 paper "vid-TLDR: Training Free Token merging for Light-weight Video Transformer".☆45Updated 9 months ago
- ☆192Updated 2 years ago
- Hierarchical Video-Moment Retrieval and Step-Captioning (CVPR 2023)☆99Updated last month
- ☆52Updated 2 years ago
- ☆31Updated 4 months ago
- Official implementation of "HowToCaption: Prompting LLMs to Transform Video Annotations at Scale." ECCV 2024☆50Updated 5 months ago
- [CVPR2022] Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos☆95Updated 2 years ago
- ICCV2023: Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning☆40Updated last year
- [CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers☆177Updated last year
- Official repository for "Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting" [CVPR 2023]☆114Updated last year
- [ACL 2023] Official PyTorch code for Singularity model in "Revealing Single Frame Bias for Video-and-Language Learning"☆132Updated last year
- ☆47Updated 2 years ago
- ☆75Updated last year
- Winner solution to Generic Event Boundary Captioning task in LOVEU Challenge (CVPR 2023 workshop)☆29Updated last year
- [arXiv:2309.16669] Code release for "Training a Large Video Model on a Single Machine in a Day"☆123Updated 7 months ago
- This repo contains source code for Glance and Focus: Memory Prompting for Multi-Event Video Question Answering (Accepted in NeurIPS 2023)☆26Updated 8 months ago
- A Unified Framework for Video-Language Understanding☆57Updated last year
- [ICLR2024] The official implementation of paper "UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal Modeling", by …☆72Updated last year
- ☆31Updated last year
- [CVPR'23 Highlight] AutoAD: Movie Description in Context.☆93Updated 3 months ago
- Official implementation of "Everything at Once - Multi-modal Fusion Transformer for Video Retrieval". CVPR 2022☆99Updated 2 years ago
- The implementation of CVPR2021 paper Temporal Query Networks for Fine-grained Video Understanding☆61Updated 2 years ago
- [SIGIR 2022] CenterCLIP: Token Clustering for Efficient Text-Video Retrieval. Also, a text-video retrieval toolbox based on CLIP + fast p…☆128Updated 2 years ago
- [ICCV 2023] Accurate and Fast Compressed Video Captioning☆36Updated last year