yousefkotp / Visual-Question-Answering
A Light weight deep learning model with with a web application to answer image-based questions with a non-generative approach for the VizWiz grand challenge 2023 by carefully curating the answer vocabulary and adding linear layer on top of Open AI's CLIP model as image and text encoder
☆11Updated last year
Alternatives and similar repositories for Visual-Question-Answering:
Users that are interested in Visual-Question-Answering are comparing it to the libraries listed below
- This is implementation of finetuning BLIP model for Visual Question Answering☆64Updated last year
- Pytorch implementation of image captioning using transformer-based model.☆65Updated last year
- NoLA Codebase☆17Updated 4 months ago
- Validating image classification benchmark results on ViTs and ResNets (v2)☆12Updated 2 years ago
- Implementation code of the work "Exploiting Multiple Sequence Lengths in Fast End to End Training for Image Captioning"☆88Updated 3 months ago
- SimVLM ---SIMPLE VISUAL LANGUAGE MODEL PRETRAINING WITH WEAK SUPERVISION☆36Updated 2 years ago
- (CVPR2024) MeaCap: Memory-Augmented Zero-shot Image Captioning☆44Updated 7 months ago
- ☆41Updated last year
- PyTorch implementation of ICML 2023 paper "SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation"☆89Updated last year
- [CVPR2024 Highlight] Official repository of the paper "The devil is in the fine-grained details: Evaluating open-vocabulary object detect…☆51Updated 6 months ago
- Pytorch implementation of VQA: Visual Question Answering (https://arxiv.org/pdf/1505.00468.pdf) using VQA v2.0 dataset for open-ended ta…☆18Updated 4 years ago
- Implementation of the Paper Scene-Graph ViT☆9Updated 3 months ago
- Source code of our AAAI 2024 paper "Cross-Modal and Uni-Modal Soft-Label Alignment for Image-Text Retrieval"☆35Updated last year
- Repository of paper Consistency-preserving Visual Question Answering in Medical Imaging (MICCAI2022)☆23Updated 2 years ago
- ☆10Updated last year
- code for studying OpenAI's CLIP explainability☆30Updated 3 years ago
- [WACV 2025] Official code for our paper "Enhancing Novel Object Detection via Cooperative Foundational Models"☆72Updated 3 weeks ago
- Multi-Aspect Vision Language Pretraining - CVPR2024☆75Updated 7 months ago
- ☆64Updated 2 months ago
- Implementation of the paper "PerSense: Personalized Instance Segmentation in Dense Images"☆24Updated 2 weeks ago
- GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection (AAAI 2024)☆64Updated last year
- Official repository of paper titled "How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs".☆45Updated 7 months ago
- [TMM 2023] Self-paced Curriculum Adapting of CLIP for Visual Grounding.☆117Updated 2 months ago
- [ICCV2023] Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer☆36Updated last year
- This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have b…☆71Updated last year
- FInetuning CLIP for Few Shot Learning☆40Updated 3 years ago
- 【AAAI 2024】An Empirical Study of CLIP for Text-based Person Search☆60Updated 11 months ago
- This repository contains codes for fine-tuning LLAVA-1.6-7b-mistral (Multimodal LLM) model.☆31Updated 4 months ago
- Hyperparameter analysis for Image Captioning using LSTMs and Transformers☆26Updated last year
- [ICCV 2021] Official implementation of the paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"☆66Updated 3 years ago