yousefkotp / Visual-Question-AnsweringLinks

A Light weight deep learning model with with a web application to answer image-based questions with a non-generative approach for the VizWiz grand challenge 2023 by carefully curating the answer vocabulary and adding linear layer on top of Open AI's CLIP model as image and text encoder

☆13

Alternatives and similar repositories for Visual-Question-Answering

Users that are interested in Visual-Question-Answering are comparing it to the libraries listed below

Sorting:

ntusteeian / VQA_CNN-LSTM
Pytorch implementation of VQA: Visual Question Answering (https://arxiv.org/pdf/1505.00468.pdf) using VQA v2.0 dataset for open-ended ta…
☆20Updated 5 years ago
zarzouram / image_captioning_with_transformers
Pytorch implementation of image captioning using transformer-based model.
☆68Updated 2 years ago
dino-chiio / blip-vqa-finetune
This is implementation of finetuning BLIP model for Visual Question Answering
☆83Updated last year
mmaaz60 / mvits_for_class_agnostic_od
[ECCV'22] Official repository of paper titled "Class-agnostic Object Detection with Multi-modal Transformer".
☆313Updated 2 years ago
aravindvarier / Image-Captioning-Pytorch
Hyperparameter analysis for Image Captioning using LSTMs and Transformers
☆26Updated last year
sunxm2357 / DualCoOp
Implementation for "DualCoOp: Fast Adaptation to Multi-Label Recognition with Limited Annotations" (NeurIPS 2022))
☆65Updated last year
sMamooler / CLIP_Explainability
code for studying OpenAI's CLIP explainability
☆34Updated 3 years ago
hanoonaR / object-centric-ovd
[NeurIPS 2022] Official repository of paper titled "Bridging the Gap between Object and Image-level Representations for Open-Vocabulary …
☆295Updated 2 years ago
davidnvq / grit
GRIT: Faster and Better Image-captioning Transformer (ECCV 2022)
☆196Updated 2 years ago
xmed-lab / CLIPN
ICCV 2023: CLIPN for Zero-Shot OOD Detection: Teaching CLIP to Say No
☆139Updated last year
mmaaz60 / mdef_detr
☆10Updated 2 years ago
uvavision / AMC-grounding
[CVPR 2023] Code for "Improving Visual Grounding by Encouraging Consistent Gradient-based Explanations"
☆19Updated last year
ivonajdenkoska / multimodal-meta-learn
[ICLR 2023] Official code repository for "Meta Learning to Bridge Vision and Language Models for Multimodal Few-Shot Learning"
☆59Updated 2 years ago
YulongBonjour / SimVLM
SimVLM ---SIMPLE VISUAL LANGUAGE MODEL PRETRAINING WITH WEAK SUPERVISION
☆36Updated 2 years ago
HCPLab-SYSU / HCP-MLR-PL
Multi-label Image Recognition with Partial Labels (IJCV'24, ESWA'24, AAAI'22)
☆40Updated last year
jchenghu / ExpansionNet_v2
Implementation code of the work "Exploiting Multiple Sequence Lengths in Fast End to End Training for Image Captioning"
☆93Updated 9 months ago
phellonchen / awesome-Vision-and-Language-Pre-training
Recent Advances in Vision and Language Pre-training (VLP)
☆294Updated 2 years ago
VinAIResearch / Counting-DETR
Few-shot Object Counting and Detection (ECCV 2022)
☆77Updated 10 months ago
amazon-science / mix-generation
MixGen: A New Multi-Modal Data Augmentation
☆127Updated 2 years ago
junchen14 / Multi-Modal-Transformer
The repository collects many various multi-modal transformer architectures, including image transformer, video transformer, image-languag…
☆230Updated 3 years ago
b-hahn / CLIP
FInetuning CLIP for Few Shot Learning
☆45Updated 3 years ago
microsoft / UniCL
[CVPR 2022] Official code for "Unified Contrastive Learning in Image-Text-Label Space"
☆402Updated last year
TomYanabe / Cassava-Leaf-Disease-Classification
[kaggle] 3rd place solution
☆31Updated 4 years ago
KennithLi / Awesome-Zero-Shot-Object-Detection
☆129Updated 3 years ago
Robbie-Xu / CPSD
PyTorch implementation of Boosting Multi-Label Image Classification with Complementary Parallel Self-Distillation, IJCAI 2022.
☆25Updated 3 years ago
linhuixiao / CLIP-VG
[TMM 2023] Self-paced Curriculum Adapting of CLIP for Visual Grounding.
☆130Updated last month
yangli18 / VLTVG
Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning, CVPR 2022
☆96Updated 2 years ago
ArrowLuo / SegCLIP
PyTorch implementation of ICML 2023 paper "SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation"
☆94Updated 2 years ago
RoyalSkye / Image-Caption
Using LSTM or Transformer to solve Image Captioning in Pytorch
☆79Updated 4 years ago
Hodasia / Awesome-Vision-Language-Finetune
Awesome List of Vision Language Prompt Papers
☆47Updated last year