[BMVC22] Official Implementation of ViCHA: "Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment"
☆54Oct 20, 2022Updated 3 years ago
Alternatives and similar repositories for ViCHA
Users that are interested in ViCHA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [TIP2023] The code of “Plug-and-Play Regulators for Image-Text Matching”☆34Apr 11, 2024Updated last year
- Code for paper: Unified Text-to-Image Generation and Retrieval☆16Jul 6, 2024Updated last year
- Context-Aware Multi-View Summarization Network for Image-Text Matching. ACM MM'20☆29May 26, 2022Updated 3 years ago
- [ICCV23] Official implementation of eP-ALM: Efficient Perceptual Augmentation of Language Models.☆27Oct 27, 2023Updated 2 years ago
- ☆27Aug 1, 2024Updated last year
- Official implementation of the paper The Hidden Language of Diffusion Models☆78Jan 24, 2024Updated 2 years ago
- ☆87Apr 15, 2022Updated 3 years ago
- Weakly Supervised Grounding for VQA in Vision-Language Transformers☆16May 6, 2023Updated 2 years ago
- TTRV: Test-Time Reinforcement Learning for Vision–Language Models (CVPR 2026)☆37Mar 8, 2026Updated 2 weeks ago
- [WACV 2024] Instruct Me More! Random Prompting for Visual In-Context Learning☆17May 7, 2025Updated 10 months ago
- Implementation of our IJCAI2022 oral paper, ER-SAN: Enhanced-Adaptive Relation Self-Attention Network for Image Captioning.☆24Aug 5, 2023Updated 2 years ago
- [ICLR 23] Contrastive Aligned of Vision to Language Through Parameter-Efficient Transfer Learning☆40Jul 29, 2023Updated 2 years ago
- BISON: Binary Image SelectiON☆49Sep 15, 2021Updated 4 years ago
- The efficient tuning method for VLMs☆81Mar 10, 2024Updated 2 years ago
- ☆73Jun 3, 2022Updated 3 years ago
- PyTorch implementation of Data2Vec self-supervised approach for vision use cases.☆18Oct 7, 2022Updated 3 years ago
- Cross-Modal Retrieval with Partially Mismatched Pairs (IEEE TPAMI 2023, PyTorch Code)☆23Sep 17, 2023Updated 2 years ago
- ☆45Aug 14, 2023Updated 2 years ago
- PyTorch implementation of HANet: Hierarchical Alignment Networks for Video-Text Retrieval (ACM MM 2021).☆47Aug 19, 2021Updated 4 years ago
- Release of ImageNet-Captions☆51Jan 20, 2023Updated 3 years ago
- Official implementation for the paper "Prompt Pre-Training with Over Twenty-Thousand Classes for Open-Vocabulary Visual Recognition"☆259May 3, 2024Updated last year
- WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs☆43Mar 6, 2026Updated 2 weeks ago
- [CVPR 2024] "Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition"☆12Feb 27, 2024Updated 2 years ago
- ☆13Jun 3, 2024Updated last year
- [SIGIR 2022] CenterCLIP: Token Clustering for Efficient Text-Video Retrieval.☆134May 4, 2022Updated 3 years ago
- Siamese network for unsupervised speech representation learning☆11Oct 12, 2018Updated 7 years ago
- Phrase Localization Evaluation Toolkit☆20Aug 16, 2019Updated 6 years ago
- [ICLR2024] (EvALign-ICL Benchmark) Beyond Task Performance: Evaluating and Reducing the Flaws of Large Multimodal Models with In-Context …☆22Mar 1, 2024Updated 2 years ago
- An official pytorch implementation of the paper: [MV-Adapter: Multimodal Video Transfer Learning for Video Text Retrieval].☆14Jul 27, 2024Updated last year
- PyTorch code for "VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks" (CVPR2022)☆210Dec 18, 2022Updated 3 years ago
- Obj2Seq: Formatting Objects as Sequences with Class Prompt for Visual Tasks (NeurIPS2022)☆84Nov 2, 2022Updated 3 years ago
- ☆10Jan 9, 2025Updated last year
- USER: Unified Semantic Enhancement with Momentum Contrast for Image-Text Retrieval, TIP 2024☆33Jun 18, 2025Updated 9 months ago
- Finetuning & extending DiffusionDet to video & pedestrian multi-object-tracking☆13Apr 12, 2023Updated 2 years ago
- Ask&Confirm: Active Detail Enriching for Cross-Modal Retrieval with Partial Query (ICCV2021)☆20Dec 4, 2021Updated 4 years ago
- The source code of the paper: "To Find Where You Talk: Temporal Sentence Localization in Video with Attention Based Location Regression"☆30Jan 8, 2019Updated 7 years ago
- Learning to Initialize Neural Networks for Stable and Efficient Training☆138May 24, 2022Updated 3 years ago
- Channel Equilibrium Networks for Learning Deep Representation, ICML2020☆22Jul 28, 2020Updated 5 years ago
- Weakly Supervised Video Moment Localisation with Contrastive Negative Sample Mining☆29Apr 4, 2022Updated 3 years ago