mshukor / ViCHALinks
[BMVC22] Official Implementation of ViCHA: "Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment"
☆55Updated 2 years ago
Alternatives and similar repositories for ViCHA
Users that are interested in ViCHA are comparing it to the libraries listed below
Sorting:
- A task-agnostic vision-language architecture as a step towards General Purpose Vision☆92Updated 3 years ago
- Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training☆137Updated 2 years ago
- [ICLR 23] Contrastive Aligned of Vision to Language Through Parameter-Efficient Transfer Learning☆39Updated last year
- Code for the paper titled "CiT Curation in Training for Effective Vision-Language Data".☆78Updated 2 years ago
- ☆50Updated 2 years ago
- Code and Models for "GeneCIS A Benchmark for General Conditional Image Similarity"☆59Updated 2 years ago
- Command-line tool for downloading and extending the RedCaps dataset.☆48Updated last year
- https://arxiv.org/abs/2209.15162☆50Updated 2 years ago
- A Unified Framework for Video-Language Understanding☆57Updated 2 years ago
- Official repository for the General Robust Image Task (GRIT) Benchmark☆54Updated 2 years ago
- [ACL 2023] Official PyTorch code for Singularity model in "Revealing Single Frame Bias for Video-and-Language Learning"☆134Updated 2 years ago
- Patching open-vocabulary models by interpolating weights☆91Updated last year
- PyTorch code for MUST☆107Updated last month
- ☆83Updated 3 years ago
- PyTorch code for "VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks" (CVPR2022)☆205Updated 2 years ago
- PyTorch code for "Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention" (WACV 2023)☆33Updated 2 years ago
- TallyQA: Answering Complex Counting Questions dataset☆24Updated last year
- CLIP4IDC: CLIP for Image Difference Captioning (AACL 2022)☆34Updated 2 years ago
- ☆30Updated last year
- Research code for "Training Vision-Language Transformers from Captions Alone"☆34Updated 2 years ago
- ☆29Updated 2 years ago
- COLA: Evaluate how well your vision-language model can Compose Objects Localized with Attributes!☆24Updated 7 months ago
- This code provides a PyTorch implementation for OTTER (Optimal Transport distillation for Efficient zero-shot Recognition), as described …☆69Updated 3 years ago
- [CVPR 2023] Learning Visual Representations via Language-Guided Sampling☆149Updated 2 years ago
- Compress conventional Vision-Language Pre-training data☆51Updated last year
- A PyTorch implementation of EmpiricalMVM☆41Updated last year
- RO-ViT CVPR 2023 "Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers"☆18Updated last year
- [CVPR-2023] The official dataset of Advancing Visual Grounding with Scene Knowledge: Benchmark and Method.☆31Updated last year
- Implementation of MaMMUT, a simple vision-encoder text-decoder architecture for multimodal tasks from Google, in Pytorch☆103Updated last year
- [ICLR2024] The official implementation of paper "UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal Modeling", by …☆74Updated last year