yousefkotp / Visual-Question-AnsweringLinks
A Light weight deep learning model with with a web application to answer image-based questions with a non-generative approach for the VizWiz grand challenge 2023 by carefully curating the answer vocabulary and adding linear layer on top of Open AI's CLIP model as image and text encoder
☆11Updated last year
Alternatives and similar repositories for Visual-Question-Answering
Users that are interested in Visual-Question-Answering are comparing it to the libraries listed below
Sorting:
- A paper list of image captioning.☆22Updated 3 years ago
- Pytorch implementation of image captioning using transformer-based model.☆66Updated 2 years ago
- This is implementation of finetuning BLIP model for Visual Question Answering☆68Updated last year
- 【AAAI 2024】An Empirical Study of CLIP for Text-based Person Search☆66Updated last year
- code for studying OpenAI's CLIP explainability☆31Updated 3 years ago
- (CVPR2024) MeaCap: Memory-Augmented Zero-shot Image Captioning☆48Updated 9 months ago
- [ACM TOMM 2023] - Composed Image Retrieval using Contrastive Learning and Task-oriented CLIP-based Features☆178Updated last year
- SimVLM ---SIMPLE VISUAL LANGUAGE MODEL PRETRAINING WITH WEAK SUPERVISION☆36Updated 2 years ago
- Pytorch implementation of VQA: Visual Question Answering (https://arxiv.org/pdf/1505.00468.pdf) using VQA v2.0 dataset for open-ended ta…☆20Updated 4 years ago
- FInetuning CLIP for Few Shot Learning☆42Updated 3 years ago
- [ICLR 2023] Official code repository for "Meta Learning to Bridge Vision and Language Models for Multimodal Few-Shot Learning"☆59Updated 2 years ago
- CLIP4IDC: CLIP for Image Difference Captioning (AACL 2022)☆34Updated 2 years ago
- The official implementation for BLIP4CIR with bi-directional training | Bi-directional Training for Composed Image Retrieval via Text Pro…☆31Updated last year
- [WACV 2025] Official code for our paper "Enhancing Novel Object Detection via Cooperative Foundational Models"☆77Updated 2 months ago
- Implementation code of the work "Exploiting Multiple Sequence Lengths in Fast End to End Training for Image Captioning"☆91Updated 5 months ago
- (ACL'2023) MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning☆35Updated 9 months ago
- [ECCV'22] Official repository of paper titled "Class-agnostic Object Detection with Multi-modal Transformer".☆311Updated 2 years ago
- Source code of our AAAI 2024 paper "Cross-Modal and Uni-Modal Soft-Label Alignment for Image-Text Retrieval"☆41Updated last year
- [CVPRW-25 MMFM] Official repository of paper titled "How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite fo…☆48Updated 9 months ago
- Task Residual for Tuning Vision-Language Models (CVPR 2023)☆73Updated 2 years ago
- [NeurIPS 2023] Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization☆105Updated last year
- [ICLR 2024] Official repository for "Vision-by-Language for Training-Free Compositional Image Retrieval"☆66Updated 11 months ago
- [Paper][AAAI 2023] DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning☆52Updated last year
- Awesome Vision-Language Pretraining Papers☆30Updated 4 months ago
- Composed Person Retrieval (CPR) is a new cross-modal retrieval task that aims to identify individuals in large-scale person image databas…☆24Updated last week
- [CVPR 2022 - Demo Track] - Effective conditioned and composed image retrieval combining CLIP-based features☆81Updated 6 months ago
- Awesome List of Vision Language Prompt Papers☆45Updated last year
- 【ICLR 2024, Spotlight】Sentence-level Prompts Benefit Composed Image Retrieval☆83Updated last year
- NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language Tasks, CVPR 2022 (Oral)☆48Updated last year
- CaMEL: Mean Teacher Learning for Image Captioning. ICPR 2022☆29Updated 2 years ago