JerryYLi / svittLinks
Code for CVPR 2023 paper "SViTT: Temporal Learning of Sparse Video-Text Transformers"
☆18Updated 2 years ago
Alternatives and similar repositories for svitt
Users that are interested in svitt are comparing it to the libraries listed below
Sorting:
- ☆24Updated 2 years ago
- Official code repo of PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs☆26Updated 6 months ago
- ☆81Updated 2 years ago
- Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"☆81Updated last year
- [CVPR 2023] HierVL Learning Hierarchical Video-Language Embeddings☆46Updated last year
- ☆29Updated 2 years ago
- Offical PyTorch implementation of Clover: Towards A Unified Video-Language Alignment and Fusion Model (CVPR2023)☆40Updated 2 years ago
- Evaluation and dataset construction code for the CVPR 2025 paper "Vision-Language Models Do Not Understand Negation"☆27Updated 3 months ago
- Data-Efficient Multimodal Fusion on a Single GPU☆66Updated last year
- Official PyTorch implementation of Which Tokens to Use? Investigating Token Reduction in Vision Transformers presented at ICCV 2023 NIVT …☆34Updated last year
- [NeurIPS 2024] Official PyTorch implementation of "Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives"☆41Updated 7 months ago
- Visual self-questioning for large vision-language assistant.☆42Updated this week
- ☆55Updated 3 years ago
- ☆20Updated 9 months ago
- ☆58Updated last year
- ☆53Updated last year
- [CVPR2025] Code Release of F-LMM: Grounding Frozen Large Multimodal Models☆98Updated last month
- Turning to Video for Transcript Sorting☆48Updated last year
- ☆19Updated 2 months ago
- ☆55Updated last year
- [CVPR'24] Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities☆99Updated last year
- ☆63Updated 2 years ago
- Official PyTorch implementation of the paper "Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge Transferring"☆104Updated last year
- [CVPR'23] AdaMAE: Adaptive Masking for Efficient Spatiotemporal Learning with Masked Autoencoders☆79Updated last year
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆37Updated last year
- [CVPR 2023] Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation☆62Updated 4 months ago
- [ICLR 23] Contrastive Aligned of Vision to Language Through Parameter-Efficient Transfer Learning☆39Updated last year
- [EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding☆51Updated last year
- ☆51Updated 6 months ago
- Code for CVPR2023 paper "Collaborative Noisy Label Cleaner: Learning Scene-aware Trailers for Multi-modal Highlight Detection in Movies"☆17Updated 2 years ago