SamsungLabs / video-retrieval-sampler
The official implementation for the paper 'mmSampler: Efficient Frame Sampler for Multimodal Video Retrieval'.
☆9Updated 2 years ago
Related projects ⓘ
Alternatives and complementary repositories for video-retrieval-sampler
- [ICLR 2022] "As-ViT: Auto-scaling Vision Transformers without Training" by Wuyang Chen, Wei Huang, Xianzhi Du, Xiaodan Song, Zhangyang Wa…☆76Updated 2 years ago
- [ECCV2022] Contrastive Vision-Language Pre-training with Limited Resources☆44Updated 2 years ago
- Use CLIP to represent video for Retrieval Task☆69Updated 3 years ago
- A PyTorch implementation of VIOLET☆137Updated 11 months ago
- DeltaCNN End-to-End CNN Inference of Sparse Frame Differences in Videos☆60Updated last year
- Code for the paper titled "CiT Curation in Training for Effective Vision-Language Data".☆78Updated last year
- Code for the Video Similarity Challenge.☆75Updated 9 months ago
- ☆31Updated 3 years ago
- Towards Video Text Visual Question Answering: Benchmark and Baseline☆37Updated 8 months ago
- MLLM-DataEngine: An Iterative Refinement Approach for MLLM☆36Updated 5 months ago
- [PR 2024] A large Cross-Modal Video Retrieval Dataset with Reading Comprehension☆22Updated 10 months ago
- Video Contrastive Learning with Global Context, ICCVW 2021☆158Updated 2 years ago
- ICCV DeeperAction Challenge - Kinetics-TPS Challenge on Part-level Action Parsing and Action Recognition.☆15Updated 3 years ago
- ☆33Updated 3 years ago
- CLIP-It! Language-Guided Video Summarization☆73Updated 3 years ago
- Official repository for the General Robust Image Task (GRIT) Benchmark☆50Updated last year
- ☆21Updated 10 months ago
- This is the official PyTorch implementation for "Mesa: A Memory-saving Training Framework for Transformers".☆118Updated 2 years ago
- UniTAB: Unifying Text and Box Outputs for Grounded VL Modeling, ECCV 2022 (Oral Presentation)☆84Updated last year
- ☆102Updated last year
- VideoCC is a dataset containing (video-URL, caption) pairs for training video-text machine learning models. It is created using an automa…☆76Updated last year
- Official implementation of AdaMML. https://arxiv.org/abs/2105.05165.☆50Updated 2 years ago
- PyTorch code for MUST☆105Updated last year
- [CVPR2022 Oral] The official code for "TransRank: Self-supervised Video Representation Learning via Ranking-based Transformation Recognit…☆18Updated 2 years ago
- [NeurIPS 2024] Classification Done Right for Vision-Language Pre-Training☆135Updated 2 weeks ago
- Interaction Compass: Multi-Label Zero-Shot Learning of Human-Object Interactions via Spatial Relations @ ICCV21☆13Updated 2 years ago
- Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training☆132Updated last year
- official impelmentation of Kangaroo: A Powerful Video-Language Model Supporting Long-context Video Input☆54Updated 2 months ago
- A PyTorch toolkit for extremely fast ImageNet training with NVIDIA DALI.☆51Updated 3 years ago