wgcban / aptLinks
PyTorch Implementation of Attention Prompt Tuning: Parameter-Efficient Adaptation of Pre-Trained Models for Action Recognition
☆15Updated last year
Alternatives and similar repositories for apt
Users that are interested in apt are comparing it to the libraries listed below
Sorting:
- [WACV2025 Oral] DeepMIM: Deep Supervision for Masked Image Modeling☆53Updated 3 weeks ago
- Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"☆20Updated 2 months ago
- Official Pytorch Implementation of Self-emerging Token Labeling☆33Updated last year
- [PR 2024] A large Cross-Modal Video Retrieval Dataset with Reading Comprehension☆26Updated last year
- ☆34Updated last year
- [ICLR 23] Contrastive Aligned of Vision to Language Through Parameter-Efficient Transfer Learning☆39Updated last year
- Official implementation of CVPR 2024 paper "Retrieval-Augmented Open-Vocabulary Object Detection".☆41Updated 8 months ago
- ☆19Updated 3 weeks ago
- [NeurIPS 2024] The official repository of "Distribution-Aware Data Expansion with Diffusion Models".☆15Updated last week
- Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"☆34Updated 9 months ago
- ☆25Updated last year
- Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types☆18Updated last month
- A benchmark dataset and simple code examples for measuring the perception and reasoning of multi-sensor Vision Language models.☆18Updated 5 months ago
- [ACL 2023] PuMer: Pruning and Merging Tokens for Efficient Vision Language Models☆29Updated 8 months ago
- ☆57Updated last year
- Code for paper: Unified Text-to-Image Generation and Retrieval☆15Updated 11 months ago
- Official repository of "Interactive Text-to-Image Retrieval with Large Language Models: A Plug-and-Play Approach" (ACL 2024 Oral)☆26Updated 2 months ago
- Data-Efficient Multimodal Fusion on a Single GPU☆64Updated last year
- Distribution-Aware Prompt Tuning for Vision-Language Models (ICCV 2023)☆40Updated last year
- ☆23Updated 2 years ago
- [TIP] Exploring Effective Factors for Improving Visual In-Context Learning☆19Updated 7 months ago
- ☆11Updated 7 months ago
- Official implementation of TagAlign☆35Updated 5 months ago
- ☆63Updated 4 months ago
- ☆51Updated 2 months ago
- [CVPR 2023] Zero-shot Generative Model Adaptation via Image-specific Prompt Learning☆84Updated last year
- Evaluation and dataset construction code for the CVPR 2025 paper "Vision-Language Models Do Not Understand Negation"☆23Updated last month
- Official repo for the TMLR paper "Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners"☆29Updated last year
- This is the official repo for ByteVideoLLM/Dynamic-VLM☆20Updated 5 months ago
- Masked Vision-Language Transformer in Fashion☆33Updated last year