☆58Dec 2, 2025Updated 3 months ago
Alternatives and similar repositories for ViS4mer
Users that are interested in ViS4mer are comparing it to the libraries listed below
Sorting:
- ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in Video-Language Models (ICLR 2024, Official Implementation)☆16Jan 18, 2024Updated 2 years ago
- [arXiv:2309.16669] Code release for "Training a Large Video Model on a Single Machine in a Day"☆138Aug 23, 2025Updated 6 months ago
- Official code for DAM: Dynamic Adapter Merging for Continual Video QA Learning☆14Apr 25, 2024Updated last year
- ☆34Jun 2, 2023Updated 2 years ago
- Thermal Indoor Motion Dataset☆14Apr 27, 2023Updated 2 years ago
- SMILE: A Multimodal Dataset for Understanding Laughter☆13Jun 15, 2023Updated 2 years ago
- [CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers☆193Sep 24, 2023Updated 2 years ago
- Code for "Are “Hierarchical” Visual Representations Hierarchical?" in NeurIPS Workshop for Symmetry and Geometry in Neural Representation…☆22Nov 8, 2023Updated 2 years ago
- Code for CVPR 2022 paper "Scene Consistency Representation Learning for Video Scene Segmentation"☆105Feb 14, 2023Updated 3 years ago
- This is the official implementation of RGNet: A Unified Retrieval and Grounding Network for Long Videos☆19Mar 3, 2025Updated last year
- ☆12Sep 11, 2021Updated 4 years ago
- A guide to structured generation using constrained decoding☆14Jun 9, 2024Updated last year
- Video shot transition detection☆25Mar 9, 2023Updated 3 years ago
- Learning Interactions and Relationships between Movie Characters (CVPR'20)☆22Apr 12, 2023Updated 2 years ago
- Code for the paper "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"☆18Mar 15, 2024Updated last year
- Code for the paper Joint Discovery of Object States and Manipulation Actions, ICCV 2017☆14Aug 7, 2018Updated 7 years ago
- [NeurIPS 2023 D&B] VidChapters-7M: Video Chapters at Scale☆204Nov 13, 2023Updated 2 years ago
- Official implementation for paper Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos☆28Dec 8, 2023Updated 2 years ago
- ☆26Aug 31, 2023Updated 2 years ago
- Code and datasets for "Text encoders are performance bottlenecks in contrastive vision-language models". Coming soon!☆11May 24, 2023Updated 2 years ago
- Self-supervised algorithm for learning representations from ego-centric video data. Code is tested on EPIC-Kitchens-100 and Ego4D in PyTo…☆13Oct 23, 2022Updated 3 years ago
- Code for "Theoretical Foundations of Deep Selective State-Space Models" (NeurIPS 2024)☆15Jan 7, 2025Updated last year
- ☆31Mar 24, 2022Updated 3 years ago
- Official implementation of "HowToCaption: Prompting LLMs to Transform Video Annotations at Scale." ECCV 2024☆58Aug 19, 2025Updated 6 months ago
- [ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …☆129Apr 4, 2025Updated 11 months ago
- [BMVC 2024] On Evaluating Adversarial Robustness of Volumetric Medical Segmentation Models☆15Nov 1, 2024Updated last year
- Official pytorch repository for "Knowing Where to Focus: Event-aware Transformer for Video Grounding" (ICCV 2023)☆55Sep 7, 2023Updated 2 years ago
- generate synthetic data for training finite state machines/pushdown automata/turing machines☆17Apr 26, 2024Updated last year
- [CVPR23 Highlight] CREPE: Can Vision-Language Foundation Models Reason Compositionally?☆35Apr 27, 2023Updated 2 years ago
- Repo for paper: "Paxion: Patching Action Knowledge in Video-Language Foundation Models" Neurips 23 Spotlight☆37May 23, 2023Updated 2 years ago
- Detect individual instruments activity in an audio file. 🎤🎹🎸🥁☆16Jun 29, 2021Updated 4 years ago
- ☆13Jul 20, 2024Updated last year
- ☆21Nov 24, 2022Updated 3 years ago
- Official This-Is-My Dataset published in CVPR 2023☆16Jul 18, 2024Updated last year
- A Unified Framework for Video-Language Understanding☆61Jun 17, 2023Updated 2 years ago
- ☆87Mar 4, 2024Updated 2 years ago
- Easiest way of fine-tuning HuggingFace video classification models☆148Mar 20, 2023Updated 2 years ago
- Official implementation of the paper How to Listen? Rethinking Visual Sound Localization☆18Apr 25, 2022Updated 3 years ago
- ☆18Aug 19, 2024Updated last year