wgcban / apt
PyTorch Implementation of Attention Prompt Tuning: Parameter-Efficient Adaptation of Pre-Trained Models for Action Recognition
☆13Updated 8 months ago
Related projects ⓘ
Alternatives and complementary repositories for apt
- ☆52Updated last year
- ☆33Updated 10 months ago
- ☆17Updated 7 months ago
- ChatterBox: Multi-round Multimodal Referring and Grounding, Multimodal, Multi-round dialogues☆50Updated 6 months ago
- Code and Data for Paper: SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data☆32Updated 8 months ago
- Data-Efficient Multimodal Fusion on a Single GPU☆47Updated 6 months ago
- Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"☆33Updated 3 months ago
- Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"☆16Updated last week
- ☆55Updated 6 months ago
- Official Pytorch Implementation of Self-emerging Token Labeling☆30Updated 7 months ago
- Official repository of paper "Subobject-level Image Tokenization"☆62Updated 6 months ago
- [CVPR 2023] Zero-shot Generative Model Adaptation via Image-specific Prompt Learning☆82Updated last year
- [PR 2024] A large Cross-Modal Video Retrieval Dataset with Reading Comprehension☆22Updated 10 months ago
- [ICLR 23] Contrastive Aligned of Vision to Language Through Parameter-Efficient Transfer Learning☆37Updated last year
- ECCV2024_Parrot Captions Teach CLIP to Spot Text☆60Updated 2 months ago
- [NeurIPS 2024] Official implementation of the paper "Interfacing Foundation Models' Embeddings"☆110Updated 3 months ago
- INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model☆39Updated 3 months ago
- ☆14Updated 6 months ago
- Official implementation of CVPR 2024 paper "Retrieval-Augmented Open-Vocabulary Object Detection".☆31Updated 2 months ago
- [ECCV2024] ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation☆56Updated 2 months ago
- FreeVA: Offline MLLM as Training-Free Video Assistant☆49Updated 5 months ago
- Adapting LLaMA Decoder to Vision Transformer☆27Updated 6 months ago
- The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". …☆39Updated 2 weeks ago
- Official implementation of TagAlign☆32Updated 7 months ago
- [ECCV2024][ICCV2023] Official PyTorch implementation of SeiT++ and SeiT☆51Updated 3 months ago
- Official Repository of Personalized Visual Instruct Tuning☆24Updated 2 weeks ago
- Multimodal Video Understanding Framework (MVU)☆23Updated 6 months ago
- Code base of SynthCLIP: CLIP training with purely synthetic text-image pairs from LLMs and TTIs.☆88Updated 7 months ago
- [ACL 2023] PuMer: Pruning and Merging Tokens for Efficient Vision Language Models☆28Updated last month
- Code and Models for "GeneCIS A Benchmark for General Conditional Image Similarity"☆54Updated last year