md-mohaiminul / ViS4merView external linksLinks
☆58Dec 2, 2025Updated 2 months ago
Alternatives and similar repositories for ViS4mer
Users that are interested in ViS4mer are comparing it to the libraries listed below
Sorting:
- ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in Video-Language Models (ICLR 2024, Official Implementation)☆16Jan 18, 2024Updated 2 years ago
- [arXiv:2309.16669] Code release for "Training a Large Video Model on a Single Machine in a Day"☆138Aug 23, 2025Updated 5 months ago
- Official code for DAM: Dynamic Adapter Merging for Continual Video QA Learning☆14Apr 25, 2024Updated last year
- ☆34Jun 2, 2023Updated 2 years ago
- Thermal Indoor Motion Dataset☆14Apr 27, 2023Updated 2 years ago
- [CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers☆192Sep 24, 2023Updated 2 years ago
- Code for "Are “Hierarchical” Visual Representations Hierarchical?" in NeurIPS Workshop for Symmetry and Geometry in Neural Representation…☆22Nov 8, 2023Updated 2 years ago
- DisTime: Distribution-based Time Representation for Video Large Language Models.☆18Jul 10, 2025Updated 7 months ago
- Code for CVPR 2022 paper "Scene Consistency Representation Learning for Video Scene Segmentation"☆105Feb 14, 2023Updated 3 years ago
- This is the official implementation of RGNet: A Unified Retrieval and Grounding Network for Long Videos☆19Mar 3, 2025Updated 11 months ago
- [MICCAI 2024] Official code for the paper "MedContext: Learning Contextual Cues for Efficient Volumetric Medical Segmentation"☆14Nov 1, 2024Updated last year
- A guide to structured generation using constrained decoding☆13Jun 9, 2024Updated last year
- Video shot transition detection☆25Mar 9, 2023Updated 2 years ago
- Learning Interactions and Relationships between Movie Characters (CVPR'20)☆22Apr 12, 2023Updated 2 years ago
- Code for the paper "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"☆18Mar 15, 2024Updated last year
- Code for the paper Joint Discovery of Object States and Manipulation Actions, ICCV 2017☆14Aug 7, 2018Updated 7 years ago
- [NeurIPS 2023 D&B] VidChapters-7M: Video Chapters at Scale☆203Nov 13, 2023Updated 2 years ago
- Multi-model video-to-text by combining embeddings from Flan-T5 + CLIP + Whisper + SceneGraph. The 'backbone LLM' is pre-trained from scra…☆54Apr 21, 2023Updated 2 years ago
- Official implementation for paper Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos☆28Dec 8, 2023Updated 2 years ago
- ☆26Aug 31, 2023Updated 2 years ago
- Code for "Theoretical Foundations of Deep Selective State-Space Models" (NeurIPS 2024)☆15Jan 7, 2025Updated last year
- Code and datasets for "Text encoders are performance bottlenecks in contrastive vision-language models". Coming soon!☆11May 24, 2023Updated 2 years ago
- Self-supervised algorithm for learning representations from ego-centric video data. Code is tested on EPIC-Kitchens-100 and Ego4D in PyTo…☆12Oct 23, 2022Updated 3 years ago
- [NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models☆158Dec 9, 2024Updated last year
- ☆31Mar 24, 2022Updated 3 years ago
- [ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …☆128Apr 4, 2025Updated 10 months ago
- [AAAI 2023 (Oral)] CrissCross: Self-Supervised Audio-Visual Representation Learning with Relaxed Cross-Modal Synchronicity☆25Jul 11, 2023Updated 2 years ago
- ☆53Jan 3, 2023Updated 3 years ago
- [BMVC 2024] On Evaluating Adversarial Robustness of Volumetric Medical Segmentation Models☆15Nov 1, 2024Updated last year
- Official pytorch repository for "Knowing Where to Focus: Event-aware Transformer for Video Grounding" (ICCV 2023)☆55Sep 7, 2023Updated 2 years ago
- [CVPR23 Highlight] CREPE: Can Vision-Language Foundation Models Reason Compositionally?☆35Apr 27, 2023Updated 2 years ago
- Official This-Is-My Dataset published in CVPR 2023☆16Jul 18, 2024Updated last year
- A Unified Framework for Video-Language Understanding☆61Jun 17, 2023Updated 2 years ago
- ☆13Jul 20, 2024Updated last year
- ☆18Jan 30, 2023Updated 3 years ago
- Detect individual instruments activity in an audio file. 🎤🎹🎸🥁☆16Jun 29, 2021Updated 4 years ago
- ☆21Nov 24, 2022Updated 3 years ago
- Easiest way of fine-tuning HuggingFace video classification models☆147Mar 20, 2023Updated 2 years ago
- [WACV 2025] Official Pytorch code for "Background-aware Moment Detection for Video Moment Retrieval"☆16Feb 24, 2025Updated 11 months ago