SamsungLabs / video-retrieval-sampler
The official implementation for the paper 'mmSampler: Efficient Frame Sampler for Multimodal Video Retrieval'.
☆9Updated 2 years ago
Alternatives and similar repositories for video-retrieval-sampler:
Users that are interested in video-retrieval-sampler are comparing it to the libraries listed below
- Use CLIP to represent video for Retrieval Task☆69Updated 3 years ago
- MLLM-DataEngine: An Iterative Refinement Approach for MLLM☆42Updated 8 months ago
- Learning Versatile Neural Architectures by Propagating Network Codes☆38Updated last year
- A PyTorch implementation of VIOLET☆137Updated last year
- Video Contrastive Learning with Global Context, ICCVW 2021☆158Updated 2 years ago
- [ICLR 2022] "Unified Vision Transformer Compression" by Shixing Yu*, Tianlong Chen*, Jiayi Shen, Huan Yuan, Jianchao Tan, Sen Yang, Ji Li…☆52Updated last year
- ☆31Updated 3 years ago
- [SIGIR 2022] CenterCLIP: Token Clustering for Efficient Text-Video Retrieval. Also, a text-video retrieval toolbox based on CLIP + fast p…☆128Updated 2 years ago
- A task-agnostic vision-language architecture as a step towards General Purpose Vision☆92Updated 3 years ago
- [ECCV2022] Contrastive Vision-Language Pre-training with Limited Resources☆45Updated 2 years ago
- Official repository for the General Robust Image Task (GRIT) Benchmark☆51Updated last year
- Research code for "Training Vision-Language Transformers from Captions Alone"☆33Updated 2 years ago
- [ICLR 2022] "As-ViT: Auto-scaling Vision Transformers without Training" by Wuyang Chen, Wei Huang, Xianzhi Du, Xiaodan Song, Zhangyang Wa…☆76Updated 2 years ago
- MDMMT: Multidomain Multimodal Transformer for Video Retrieval☆26Updated 3 years ago
- UniTAB: Unifying Text and Box Outputs for Grounded VL Modeling, ECCV 2022 (Oral Presentation)☆84Updated last year
- Benchmarking Attention Mechanism in Vision Transformers.☆17Updated 2 years ago
- A Unified Framework for Video-Language Understanding☆56Updated last year
- [AAAI 2021] Confidence-aware Non-repetitive Multimodal Transformers for TextCaps☆24Updated last year
- The official implementation of You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Natu…☆48Updated 3 years ago
- ☆16Updated 2 years ago
- code for NASViT☆68Updated 2 years ago
- A huge dataset for Document Visual Question Answering☆15Updated 6 months ago
- [ECCV 2022] SuperTickets: Drawing Task-Agnostic Lottery Tickets from Supernets via Jointly Architecture Searching and Parameter Pruning☆19Updated 2 years ago
- [ACL 2023] Official PyTorch code for Singularity model in "Revealing Single Frame Bias for Video-and-Language Learning"☆131Updated last year
- ☆34Updated 3 years ago
- ☆43Updated 5 months ago
- ☆55Updated 3 weeks ago
- ☆34Updated 2 years ago
- ☆105Updated 2 years ago
- cuda implementation of depthwise conv3d☆22Updated 3 years ago