bfshi / AbSViT
Official code for "Top-Down Visual Attention from Analysis by Synthesis" (CVPR 2023 highlight)
☆160Updated last year
Related projects: ⓘ
- ☆98Updated 6 months ago
- [ICCV 2023] Binary Adapters, [AAAI 2023] FacT, [Tech report] Convpass☆167Updated last year
- [NeurIPS 2022] Implementation of "AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition"☆317Updated 2 years ago
- The official repository for ICLR2024 paper "FROSTER: Frozen CLIP is a Strong Teacher for Open-Vocabulary Action Recognition"☆55Updated 5 months ago
- Exploring Visual Prompts for Adapting Large-Scale Models☆260Updated 2 years ago
- [ICLR2024 Spotlight] Code Release of CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction☆161Updated 7 months ago
- Open-vocabulary Semantic Segmentation☆162Updated last year
- ☆55Updated last year
- [ICCV 2023] Code for "Not All Features Matter: Enhancing Few-shot CLIP with Adaptive Prior Refinement"☆135Updated 4 months ago
- [ICLR'23] AIM: Adapting Image Models for Efficient Video Action Recognition☆265Updated last year
- A temporary webpage for our survey in AGI for computer vision☆117Updated 4 months ago
- [ICLR2023] PLOT: Prompt Learning with Optimal Transport for Vision-Language Models☆137Updated 9 months ago
- This is a PyTorch implementation of “Context AutoEncoder for Self-Supervised Representation Learning"☆190Updated last year
- PyTorch implementation of ICML 2023 paper "SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation"☆78Updated last year
- [NeurIPS'23] DropPos: Pre-Training Vision Transformers by Reconstructing Dropped Positions☆59Updated 4 months ago
- [CVPR 2023] This repository includes the official implementation our paper "Masked Autoencoders Enable Efficient Knowledge Distillers"☆97Updated last year
- SeqTR: A Simple yet Universal Network for Visual Grounding☆128Updated 3 months ago
- ☆178Updated last year
- [ICCV'23 Main Track, WECIA'23 Oral] Official repository of paper titled "Self-regulating Prompts: Foundational Model Adaptation without F…☆219Updated 11 months ago
- [CVPR2024] The code of "UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory"☆62Updated 4 months ago
- ☆77Updated 2 years ago
- Official implementation of SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference☆110Updated 8 months ago
- Official repo for our ICML 23 paper: "Multi-Modal Classifiers for Open-Vocabulary Object Detection"☆76Updated last year
- Referring Video Object Segmentation / Multi-Object Tracking Repo☆84Updated last year
- ☆104Updated 2 months ago
- Open source implementation of "Vision Transformers Need Registers"☆126Updated last week
- ☆106Updated 3 months ago
- ☆79Updated last year
- ☆171Updated last year
- MixMIM: Mixed and Masked Image Modeling for Efficient Visual Representation Learning☆128Updated last year