bpiyush / TestOfTimeLinks

Official code for our CVPR 2023 paper: Test of Time: Instilling Video-Language Models with a Sense of Time

☆45

Alternatives and similar repositories for TestOfTime

Users that are interested in TestOfTime are comparing it to the libraries listed below

Sorting:

tsujuifu / pytorch_empirical-mvm
A PyTorch implementation of EmpiricalMVM
☆41Updated last year
salesforce / paprika
Code for CVPR 2023 paper "Procedure-Aware Pretraining for Instructional Video Understanding"
☆50Updated 6 months ago
facebookresearch / ProcedureVRL
[CVPR 2023] Official code for "Learning Procedure-aware Video Representation from Instructional Videos and Their Narrations"
☆54Updated last year
YoadTew / zero-shot-video-to-text
☆76Updated 2 years ago
google-research-datasets / videoCC-data
VideoCC is a dataset containing (video-URL, caption) pairs for training video-text machine learning models. It is created using an automa…
☆78Updated 2 years ago
rowanz / merlot_reserve
Code release for "MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound"
☆143Updated 3 years ago
microsoft / LAVENDER
A Unified Framework for Video-Language Understanding
☆57Updated 2 years ago
goel-shashank / CyCLIP
☆120Updated 2 years ago
RAIVNLab / CREPE
[CVPR23 Highlight] CREPE: Can Vision-Language Foundation Models Reason Compositionally?
☆33Updated 2 years ago
facebookresearch / EgoVLPv2
Code release for "EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone" [ICCV, 2023]
☆99Updated last year
antoyang / just-ask
[ICCV 2021 Oral + TPAMI] Just Ask: Learning to Answer Questions from Millions of Narrated Videos
☆123Updated last year
zhaoyue-zephyrus / AVION
[arXiv:2309.16669] Code release for "Training a Large Video Model on a Single Machine in a Day"
☆133Updated last year
lucas-ventura / CoVR
Official PyTorch implementation of the paper "CoVR: Learning Composed Video Retrieval from Web Video Captions".
☆109Updated 3 months ago
NVlabs / PALAVRA
☆52Updated 3 years ago
NVlabs / Bongard-HOI
[CVPR 2022 (oral)] Bongard-HOI for benchmarking few-shot visual reasoning
☆71Updated 2 years ago
TheShadow29 / VidSitu
[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)
☆61Updated 3 years ago
facebookresearch / video-distant-supervision
This is an official pytorch implementation of Learning To Recognize Procedural Activities with Distant Supervision. In this repository, w…
☆42Updated 2 years ago
facebookresearch / vq2d_cvpr
This repo contains the code for the recipe of the winning entry to the Ego4d VQ2D challenge at CVPR 2022.
☆41Updated 2 years ago
ninatu / howtocaption
Official implementation of "HowToCaption: Prompting LLMs to Transform Video Annotations at Scale." ECCV 2024
☆55Updated 10 months ago
allenai / gpv-1
A task-agnostic vision-language architecture as a step towards General Purpose Vision
☆92Updated 4 years ago
Yui010206 / SeViLA
[NeurIPS 2023] Self-Chained Image-Language Model for Video Localization and Question Answering
☆187Updated last year
facebookresearch / HierVL
[CVPR 2023] HierVL Learning Hierarchical Video-Language Embeddings
☆46Updated last year
amitakamath / vl_text_encoders_are_bottlenecks
Code and datasets for "Text encoders are performance bottlenecks in contrastive vision-language models". Coming soon!
☆11Updated 2 years ago
DavidMChan / caption-by-committee
Using LLMs and pre-trained caption models for super-human performance on image captioning.
☆42Updated last year
zinengtang / TVLT
PyTorch code for “TVLT: Textless Vision-Language Transformer” (NeurIPS 2022 Oral)
☆125Updated 2 years ago
brown-palm / AntGPT
Official code implemtation of paper AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos?
☆22Updated 10 months ago
antoyang / FrozenBiLM
[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
☆157Updated 7 months ago
fmthoker / SEVERE-BENCHMARK
☆26Updated last year
leonnnop / VAR
[CVPR 2022] Visual Abductive Reasoning
☆122Updated 9 months ago
showlab / datacentric.vlp
Compress conventional Vision-Language Pre-training data
☆51Updated last year