md-mohaiminul/ViS4mer

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/md-mohaiminul/ViS4mer)

md-mohaiminul / ViS4mer

☆58

Alternatives and similar repositories for ViS4mer

Users that are interested in ViS4mer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

zhaoyue-zephyrus / AVION
View on GitHub
[arXiv:2309.16669] Code release for "Training a Large Video Model on a Single Machine in a Day"
☆138Aug 23, 2025Updated 11 months ago
md-mohaiminul / TranS4mer
View on GitHub
☆34Jun 2, 2023Updated 3 years ago
ilkerkesen / ViLMA
View on GitHub
ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in Video-Language Models (ICLR 2024, Official Implementation)
☆16Jan 18, 2024Updated 2 years ago
SMILE-data / SMILE
View on GitHub
SMILE: A Multimodal Dataset for Understanding Laughter
☆13Jun 15, 2023Updated 3 years ago
antoyang / TubeDETR
View on GitHub
[CVPR 2022 Oral] TubeDETR: Spatio-Temporal Video Grounding with Transformers
☆194Sep 24, 2023Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
MCG-NJU / OCSampler
View on GitHub
[CVPR 2022] OCSampler: Compressing Videos to One Clip with Single-step Sampling
☆17Jun 21, 2022Updated 4 years ago
Tanveer81 / RGNet
View on GitHub
This is the official implementation of RGNet: A Unified Retrieval and Grounding Network for Long Videos
☆20Mar 3, 2025Updated last year
zjr2000 / GVL
View on GitHub
Official implementation for paper Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos
☆28Dec 8, 2023Updated 2 years ago
ethanlshen / HierNet
View on GitHub
Code for "Are “Hierarchical” Visual Representations Hierarchical?" in NeurIPS Workshop for Symmetry and Geometry in Neural Representation…
☆23Nov 8, 2023Updated 2 years ago
klauscc / DAM
View on GitHub
Official code for DAM: Dynamic Adapter Merging for Continual Video QA Learning
☆15Apr 25, 2024Updated 2 years ago
TencentYoutuResearch / SceneSegmentation-SCRL
View on GitHub
Code for CVPR 2022 paper "Scene Consistency Representation Learning for Video Scene Segmentation"
☆112Feb 14, 2023Updated 3 years ago
kahnchana / clippy
View on GitHub
Perceptual Grouping in Contrastive Vision-Language Models (ICCV'23)
☆37Jan 1, 2024Updated 2 years ago
fmthoker / SEVERE-BENCHMARK
View on GitHub
☆26Aug 31, 2023Updated 2 years ago
HimangiM / RepLAI
View on GitHub
Self-supervised algorithm for learning representations from ego-centric video data. Code is tested on EPIC-Kitchens-100 and Ego4D in PyTo…
☆13Oct 23, 2022Updated 3 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
jalayrac / object-states-action
View on GitHub
Code for the paper Joint Discovery of Object States and Manipulation Actions, ICCV 2017
☆14Aug 7, 2018Updated 7 years ago
Augusta-A / Awesome-EfficientVideo
View on GitHub
☆12Sep 11, 2021Updated 4 years ago
llyx97 / TempCompass
View on GitHub
[ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …
☆133Apr 4, 2025Updated last year
JunweiLiang / MultiTrain
View on GitHub
Code and model for "Multi-dataset Training of Transformers for Robust Action Recognition", NeurIPS 2022 Spotlight
☆20Aug 1, 2023Updated 2 years ago
klauscc / VindLU
View on GitHub
☆108Dec 23, 2022Updated 3 years ago
antoyang / FrozenBiLM
View on GitHub
[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
☆159Dec 9, 2024Updated last year
hananshafi / MedContext
View on GitHub
[MICCAI 2024] Official code for the paper "MedContext: Learning Contextual Cues for Efficient Volumetric Medical Segmentation"
☆14Nov 1, 2024Updated last year
Annusha / LIReC
View on GitHub
Learning Interactions and Relationships between Movie Characters (CVPR'20)
☆22Apr 12, 2023Updated 3 years ago
amitakamath / vl_text_encoders_are_bottlenecks
View on GitHub
Code and datasets for "Text encoders are performance bottlenecks in contrastive vision-language models". Coming soon!
☆11May 24, 2023Updated 3 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
MCG-NJU / AMD
View on GitHub
[CVPR 2024] Asymmetric Masked Distillation for Pre-Training Small Foundation Models
☆18Jan 11, 2026Updated 6 months ago
microsoft / LAVENDER
View on GitHub
A Unified Framework for Video-Language Understanding
☆62Jun 17, 2023Updated 3 years ago
Muzammal-Naseer / DCViT-AT
View on GitHub
Official repository for "Boosting Adversarial Transferability using Dynamic Cues " (ICLR 2023)
☆20Aug 24, 2023Updated 2 years ago
j-min / HiREST
View on GitHub
Hierarchical Video-Moment Retrieval and Step-Captioning (CVPR 2023)
☆110Jan 23, 2025Updated last year
AndongDeng / BEAR
View on GitHub
BEAR: a new BEnchmark on video Action Recognition
☆46Apr 21, 2024Updated 2 years ago
leexinhao / ZeroI2V
View on GitHub
[ECCV 2024] ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video
☆20Jul 29, 2024Updated last year
Catherine-R-He / HetNet
View on GitHub
Codes for the AAAI 2023 paper (Oral) "Efficient Mirror Detection via Multi-level Heterogeneous Learning" https://arxiv.org/pdf/2211.1564…
☆15Jan 18, 2023Updated 3 years ago
alibaba-mmai-research / DiST
View on GitHub
ICCV2023: Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning
☆41Sep 25, 2023Updated 2 years ago
antoyang / VidChapters
View on GitHub
[NeurIPS 2023 D&B] VidChapters-7M: Video Chapters at Scale
☆213Nov 13, 2023Updated 2 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
OpenGVLab / unmasked_teacher
View on GitHub
[ICCV2023 Oral] Unmasked Teacher: Towards Training-Efficient Video Foundation Models
☆348May 27, 2024Updated 2 years ago
zinengtang / TVLT
View on GitHub
PyTorch code for “TVLT: Textless Vision-Language Transformer” (NeurIPS 2022 Oral)
☆127Feb 24, 2023Updated 3 years ago
jinhyunj / EaTR
View on GitHub
Official pytorch repository for "Knowing Where to Focus: Event-aware Transformer for Video Grounding" (ICCV 2023)
☆55Sep 7, 2023Updated 2 years ago
klauscc / TALLFormer
View on GitHub
☆53Jan 3, 2023Updated 3 years ago
facebookresearch / Motionformer
View on GitHub
Code + pre-trained models for the paper Keeping Your Eye on the Ball Trajectory Attention in Video Transformers
☆234Jun 13, 2022Updated 4 years ago
tyshiwo1 / Awesome-Visual-Tokenizer
View on GitHub
Awesome Visual Tokenizers/Autoencoders
☆20Nov 19, 2025Updated 8 months ago
aszala / VPEval
View on GitHub
VPEval Codebase from Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)
☆45Nov 29, 2023Updated 2 years ago