mlvlab / VT-TWINS
Video-Text Representation Learning via Differentiable Weak Temporal Alignment (CVPR 2022)
☆15Updated 7 months ago
Related projects ⓘ
Alternatives and complementary repositories for VT-TWINS
- Official implementation of CVPR 2024 paper "vid-TLDR: Training Free Token merging for Light-weight Video Transformer".☆37Updated 6 months ago
- The Pytorch implementation for "Video-Text Pre-training with Learned Regions"☆42Updated 2 years ago
- [CVPR 2022] The code for our paper 《Object-aware Video-language Pre-training for Retrieval》☆62Updated 2 years ago
- ☆25Updated last year
- ☆102Updated last year
- This is an official pytorch implementation of Learning To Recognize Procedural Activities with Distant Supervision. In this repository, w…☆40Updated last year
- [Arxiv2022] Revitalize Region Feature for Democratizing Video-Language Pre-training☆21Updated 2 years ago
- A Unified Framework for Video-Language Understanding☆56Updated last year
- Open-Vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models (ICCV 20…☆18Updated 7 months ago
- Winner solution to Generic Event Boundary Captioning task in LOVEU Challenge (CVPR 2023 workshop)☆29Updated 10 months ago
- ☆54Updated 4 months ago
- The 1st place solution of 2022 Ego4d Natural Language Queries.☆32Updated 2 years ago
- Research code for "Training Vision-Language Transformers from Captions Alone"☆33Updated 2 years ago
- [ECCV 2024] ControlCap: Controllable Region-level Captioning☆55Updated last month
- ☆74Updated 2 years ago
- Temporal Alignment Representations with Contrastive Learning☆22Updated last year
- Code for the paper "Zero-shot Natural Language Video Localization" (ICCV2021, Oral).☆46Updated last year
- Repo for paper: "Paxion: Patching Action Knowledge in Video-Language Foundation Models" Neurips 23 Spotlight☆35Updated last year
- Accelerating Vision-Language Pretraining with Free Language Modeling (CVPR 2023)☆31Updated last year
- ☆19Updated last year
- A PyTorch implementation of EmpiricalMVM☆39Updated 11 months ago
- ☆21Updated 2 years ago
- Code for paper "Point and Ask: Incorporating Pointing into Visual Question Answering"☆18Updated 2 years ago
- Official code for "Disentangling Visual Embeddings for Attributes and Objects" Published at CVPR 2022☆33Updated last year
- COLA: Evaluate how well your vision-language model can Compose Objects Localized with Attributes!☆22Updated this week
- Compress conventional Vision-Language Pre-training data☆49Updated last year
- This is the official implementation of Elaborative Rehearsal for Zero-shot Action Recognition (ICCV2021)☆36Updated 2 years ago
- [ICCV2021] Generic Event Boundary Detection: A Benchmark for Event Segmentation☆68Updated 2 years ago
- Official repo for CVPR 2022 (Oral) paper: Revisiting the "Video" in Video-Language Understanding. Contains code for the Atemporal Probe (…☆49Updated 5 months ago
- [ECCVW'24] Long-form Video Understanding by Bridging Episodic Memory and Semantic Knowledge☆14Updated last month