yousefkotp / Visual-Question-Answering
A Light weight deep learning model with with a web application to answer image-based questions with a non-generative approach for the VizWiz grand challenge 2023 by carefully curating the answer vocabulary and adding linear layer on top of Open AI's CLIP model as image and text encoder
☆10Updated last year
Alternatives and similar repositories for Visual-Question-Answering:
Users that are interested in Visual-Question-Answering are comparing it to the libraries listed below
- Pytorch implementation of image captioning using transformer-based model.☆62Updated last year
- In this work, we implement different cross-modal learning schemes such as Siamese Network, Correlational Network and Deep Cross-Modal Pro…☆11Updated 3 years ago
- Training DETR on Custom Dataset for Object Detection☆14Updated 3 years ago
- code for studying OpenAI's CLIP explainability☆29Updated 3 years ago
- ☆40Updated last year
- (CVPR2024) MeaCap: Memory-Augmented Zero-shot Image Captioning☆43Updated 6 months ago
- CaMEL: Mean Teacher Learning for Image Captioning. ICPR 2022☆29Updated 2 years ago
- [ICLR'24] Consistency-guided Prompt Learning for Vision-Language Models☆66Updated 8 months ago
- [ICCV 2021] Official implementation of the paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"☆66Updated 3 years ago
- This repository provides a comprehensive collection of research papers focused on multimodal representation learning, all of which have b…☆71Updated last year
- NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language Tasks, CVPR 2022 (Oral)☆47Updated last year
- Using LSTM or Transformer to solve Image Captioning in Pytorch☆76Updated 3 years ago
- Towards Local Visual Modeling for Image Captioning☆27Updated last year
- Code for Label Propagation for Zero-shot Classification with Vision-Language Models (CVPR2024)☆36Updated 6 months ago
- SimVLM ---SIMPLE VISUAL LANGUAGE MODEL PRETRAINING WITH WEAK SUPERVISION☆36Updated 2 years ago
- Image Captioning using CNN and Transformer.☆51Updated 3 years ago
- With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning. ICCV 2023☆16Updated 8 months ago
- A self-evident application of the VQA task is to design systems that aid blind people with sight reliant queries. The VizWiz VQA dataset …☆15Updated last year
- CLIP4IDC: CLIP for Image Difference Captioning (AACL 2022)☆32Updated 2 years ago
- A paper list of image captioning.☆22Updated 2 years ago
- Validating image classification benchmark results on ViTs and ResNets (v2)☆12Updated 2 years ago
- ☆33Updated last month
- Official Implementation of Few-shot Visual Relationship Co-localization☆25Updated 3 years ago
- Implementation code of the work "Exploiting Multiple Sequence Lengths in Fast End to End Training for Image Captioning"☆86Updated last month
- Source code of our AAAI 2024 paper "Cross-Modal and Uni-Modal Soft-Label Alignment for Image-Text Retrieval"☆33Updated 10 months ago
- Official PyTorch implementation of our CVPR 2022 paper: Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for …☆60Updated 2 years ago
- [CVPR2024 Highlight] Official repository of the paper "The devil is in the fine-grained details: Evaluating open-vocabulary object detect…☆50Updated 4 months ago
- [TMM 2023] Self-paced Curriculum Adapting of CLIP for Visual Grounding.☆116Updated last month
- Pytorch implementation of VQA: Visual Question Answering (https://arxiv.org/pdf/1505.00468.pdf) using VQA v2.0 dataset for open-ended ta…☆17Updated 4 years ago
- PyTorch implementation of ICML 2023 paper "SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation"☆87Updated last year