yousefkotp / Visual-Question-AnsweringLinks
A Light weight deep learning model with with a web application to answer image-based questions with a non-generative approach for the VizWiz grand challenge 2023 by carefully curating the answer vocabulary and adding linear layer on top of Open AI's CLIP model as image and text encoder
☆13Updated 2 years ago
Alternatives and similar repositories for Visual-Question-Answering
Users that are interested in Visual-Question-Answering are comparing it to the libraries listed below
Sorting:
- Pytorch implementation of VQA: Visual Question Answering (https://arxiv.org/pdf/1505.00468.pdf) using VQA v2.0 dataset for open-ended ta…☆20Updated 5 years ago
- Pytorch implementation of image captioning using transformer-based model.☆68Updated 2 years ago
- This is implementation of finetuning BLIP model for Visual Question Answering☆83Updated last year
- [ECCV'22] Official repository of paper titled "Class-agnostic Object Detection with Multi-modal Transformer".☆313Updated 2 years ago
- Hyperparameter analysis for Image Captioning using LSTMs and Transformers☆26Updated last year
- Implementation for "DualCoOp: Fast Adaptation to Multi-Label Recognition with Limited Annotations" (NeurIPS 2022))☆65Updated last year
- code for studying OpenAI's CLIP explainability☆34Updated 3 years ago
- [NeurIPS 2022] Official repository of paper titled "Bridging the Gap between Object and Image-level Representations for Open-Vocabulary …☆295Updated 2 years ago
- GRIT: Faster and Better Image-captioning Transformer (ECCV 2022)☆196Updated 2 years ago
- ICCV 2023: CLIPN for Zero-Shot OOD Detection: Teaching CLIP to Say No☆139Updated last year
- ☆10Updated 2 years ago
- [CVPR 2023] Code for "Improving Visual Grounding by Encouraging Consistent Gradient-based Explanations"☆19Updated last year
- [ICLR 2023] Official code repository for "Meta Learning to Bridge Vision and Language Models for Multimodal Few-Shot Learning"☆59Updated 2 years ago
- SimVLM ---SIMPLE VISUAL LANGUAGE MODEL PRETRAINING WITH WEAK SUPERVISION☆36Updated 2 years ago
- Multi-label Image Recognition with Partial Labels (IJCV'24, ESWA'24, AAAI'22)☆40Updated last year
- Implementation code of the work "Exploiting Multiple Sequence Lengths in Fast End to End Training for Image Captioning"☆93Updated 9 months ago
- Recent Advances in Vision and Language Pre-training (VLP)☆294Updated 2 years ago
- Few-shot Object Counting and Detection (ECCV 2022)☆77Updated 10 months ago
- MixGen: A New Multi-Modal Data Augmentation☆127Updated 2 years ago
- The repository collects many various multi-modal transformer architectures, including image transformer, video transformer, image-languag…☆230Updated 3 years ago
- FInetuning CLIP for Few Shot Learning☆45Updated 3 years ago
- [CVPR 2022] Official code for "Unified Contrastive Learning in Image-Text-Label Space"☆402Updated last year
- [kaggle] 3rd place solution☆31Updated 4 years ago
- ☆129Updated 3 years ago
- PyTorch implementation of Boosting Multi-Label Image Classification with Complementary Parallel Self-Distillation, IJCAI 2022.☆25Updated 3 years ago
- [TMM 2023] Self-paced Curriculum Adapting of CLIP for Visual Grounding.☆130Updated last month
- Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning, CVPR 2022☆96Updated 2 years ago
- PyTorch implementation of ICML 2023 paper "SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation"☆94Updated 2 years ago
- Using LSTM or Transformer to solve Image Captioning in Pytorch☆79Updated 4 years ago
- Awesome List of Vision Language Prompt Papers☆47Updated last year