JerryYLi / svitt
Code for CVPR 2023 paper "SViTT: Temporal Learning of Sparse Video-Text Transformers"
☆18Updated last year
Alternatives and similar repositories for svitt:
Users that are interested in svitt are comparing it to the libraries listed below
- COLA: Evaluate how well your vision-language model can Compose Objects Localized with Attributes!☆24Updated 4 months ago
- Turning to Video for Transcript Sorting☆48Updated last year
- Winner solution to Generic Event Boundary Captioning task in LOVEU Challenge (CVPR 2023 workshop)☆30Updated last year
- [CVPR-2023] The official dataset of Advancing Visual Grounding with Scene Knowledge: Benchmark and Method.☆30Updated last year
- [ICLR 23] Contrastive Aligned of Vision to Language Through Parameter-Efficient Transfer Learning☆38Updated last year
- ☆23Updated 2 years ago
- ICCV2023: Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning☆41Updated last year
- [CVPR 2024] Improving language-visual pretraining efficiency by perform cluster-based masking on images.☆27Updated 10 months ago
- Towards a Unified View on Visual Parameter-Efficient Transfer Learning☆26Updated 2 years ago
- Distribution-Aware Prompt Tuning for Vision-Language Models (ICCV 2023)☆38Updated last year
- ☆47Updated 2 years ago
- [NeurIPS 2024] Official PyTorch implementation of "Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives"☆37Updated 3 months ago
- Code for CVPR2023 paper "Collaborative Noisy Label Cleaner: Learning Scene-aware Trailers for Multi-modal Highlight Detection in Movies"☆17Updated 2 years ago
- A Unified Framework for Video-Language Understanding☆57Updated last year
- Offical PyTorch implementation of Clover: Towards A Unified Video-Language Alignment and Fusion Model (CVPR2023)☆41Updated 2 years ago
- [IJCAI 2023] Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment☆51Updated 11 months ago
- Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".☆58Updated last year
- ☆29Updated last year
- [AAAI 2024] GMMFormer: Gaussian-Mixture-Model Based Transformer for Efficient Partially Relevant Video Retrieval☆15Updated 10 months ago
- Code implementation of paper "MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval (AAAI2025)"☆18Updated last month
- [CVPR 2023] HierVL Learning Hierarchical Video-Language Embeddings☆45Updated last year
- FreeVA: Offline MLLM as Training-Free Video Assistant☆57Updated 9 months ago
- ☆50Updated 9 months ago
- [CVPR2024] The code of "UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory"☆67Updated 5 months ago
- [CVPR 2024] Do you remember? Dense Video Captioning with Cross-Modal Memory Retrieval☆53Updated 9 months ago
- (ACL'2023) MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning☆35Updated 7 months ago
- PyTorch code for "Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training"☆33Updated last year
- Official code repo of PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs☆25Updated 2 months ago
- [CVPR 2024] The official implementation of paper "synthesize, diagnose, and optimize: towards fine-grained vision-language understanding"☆40Updated last month
- Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"☆21Updated 2 months ago