[BMVC22] Official Implementation of ViCHA: "Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment"
☆54Oct 20, 2022Updated 3 years ago
Alternatives and similar repositories for ViCHA
Users that are interested in ViCHA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Code for paper: Unified Text-to-Image Generation and Retrieval☆16Jul 6, 2024Updated last year
- Context-Aware Multi-View Summarization Network for Image-Text Matching. ACM MM'20☆29May 26, 2022Updated 4 years ago
- [ICCV23] Official implementation of eP-ALM: Efficient Perceptual Augmentation of Language Models.☆27Oct 27, 2023Updated 2 years ago
- Pytorch-based tools for constructing a vocabulary of visual concepts in a GAN.☆17Feb 25, 2022Updated 4 years ago
- ☆28Aug 1, 2024Updated last year
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Weakly Supervised Grounding for VQA in Vision-Language Transformers☆16May 6, 2023Updated 3 years ago
- ☆92Apr 15, 2022Updated 4 years ago
- Official implementation of the paper The Hidden Language of Diffusion Models☆77Jan 24, 2024Updated 2 years ago
- ☆37Oct 7, 2023Updated 2 years ago
- TTRV: Test-Time Reinforcement Learning for Vision–Language Models (CVPR 2026)☆43Mar 8, 2026Updated 3 months ago
- GAN(TK)²: GAN Neural Tangent Kernel ToolKit☆13Jul 12, 2022Updated 3 years ago
- [WACV 2024] Instruct Me More! Random Prompting for Visual In-Context Learning☆17May 7, 2025Updated last year
- TIER: Text-Image Encoder-based Regression for AIGC Image Quality Assessment☆10Mar 1, 2025Updated last year
- Implementation of our IJCAI2022 oral paper, ER-SAN: Enhanced-Adaptive Relation Self-Attention Network for Image Captioning.☆25Aug 5, 2023Updated 2 years ago
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- [ICLR 23] Contrastive Aligned of Vision to Language Through Parameter-Efficient Transfer Learning☆40Jul 29, 2023Updated 2 years ago
- BISON: Binary Image SelectiON☆49Sep 15, 2021Updated 4 years ago
- The efficient tuning method for VLMs☆82Mar 10, 2024Updated 2 years ago
- ☆73Jun 3, 2022Updated 4 years ago
- PyTorch implementation of Data2Vec self-supervised approach for vision use cases.☆18Oct 7, 2022Updated 3 years ago
- Cross-Modal Retrieval with Partially Mismatched Pairs (IEEE TPAMI 2023, PyTorch Code)☆23Sep 17, 2023Updated 2 years ago
- Official Code of ECCV 2022 paper MS-CLIP☆91Jul 27, 2022Updated 3 years ago
- ☆45Aug 14, 2023Updated 2 years ago
- PyTorch implementation of HANet: Hierarchical Alignment Networks for Video-Text Retrieval (ACM MM 2021).☆47Aug 19, 2021Updated 4 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Release of ImageNet-Captions☆51Jan 20, 2023Updated 3 years ago
- Official implementation for the paper "Prompt Pre-Training with Over Twenty-Thousand Classes for Open-Vocabulary Visual Recognition"☆259May 3, 2024Updated 2 years ago
- WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs☆49May 7, 2026Updated last month
- ☆13Jun 3, 2024Updated 2 years ago
- [CVPR 2024] "Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition"☆12Feb 27, 2024Updated 2 years ago
- [SIGIR 2022] CenterCLIP: Token Clustering for Efficient Text-Video Retrieval.☆135May 4, 2022Updated 4 years ago
- [ICLR2024] (EvALign-ICL Benchmark) Beyond Task Performance: Evaluating and Reducing the Flaws of Large Multimodal Models with In-Context …☆22Mar 1, 2024Updated 2 years ago
- PyTorch code for "VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks" (CVPR2022)☆211Dec 18, 2022Updated 3 years ago
- An official pytorch implementation of the paper: [MV-Adapter: Multimodal Video Transfer Learning for Video Text Retrieval].☆14Jul 27, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆10Jan 9, 2025Updated last year
- Obj2Seq: Formatting Objects as Sequences with Class Prompt for Visual Tasks (NeurIPS2022)☆85Nov 2, 2022Updated 3 years ago
- USER: Unified Semantic Enhancement with Momentum Contrast for Image-Text Retrieval, TIP 2024☆33Jun 18, 2025Updated 11 months ago
- Finetuning & extending DiffusionDet to video & pedestrian multi-object-tracking☆13Apr 12, 2023Updated 3 years ago
- Ask&Confirm: Active Detail Enriching for Cross-Modal Retrieval with Partial Query (ICCV2021)☆20Dec 4, 2021Updated 4 years ago
- The source code of the paper: "To Find Where You Talk: Temporal Sentence Localization in Video with Attention Based Location Regression"☆30Jan 8, 2019Updated 7 years ago
- [arXiv22] Disentangled Representation Learning for Text-Video Retrieval☆97Apr 7, 2022Updated 4 years ago