yousefkotp / Visual-Question-AnsweringLinks
A Light weight deep learning model with with a web application to answer image-based questions with a non-generative approach for the VizWiz grand challenge 2023 by carefully curating the answer vocabulary and adding linear layer on top of Open AI's CLIP model as image and text encoder
☆14Updated 2 years ago
Alternatives and similar repositories for Visual-Question-Answering
Users that are interested in Visual-Question-Answering are comparing it to the libraries listed below
Sorting:
- Pytorch implementation of image captioning using transformer-based model.☆68Updated 2 years ago
- 【AAAI 2024】An Empirical Study of CLIP for Text-based Person Search☆73Updated last year
- Hyperparameter analysis for Image Captioning using LSTMs and Transformers☆26Updated 2 years ago
- [ACM TOMM 2023] - Composed Image Retrieval using Contrastive Learning and Task-oriented CLIP-based Features☆192Updated 2 years ago
- This is implementation of finetuning BLIP model for Visual Question Answering☆83Updated 2 years ago
- [ECCV'22] Official repository of paper titled "Class-agnostic Object Detection with Multi-modal Transformer".☆316Updated 2 years ago
- Few-shot Object Counting and Detection (ECCV 2022)☆83Updated last year
- Implementation code of the work "Exploiting Multiple Sequence Lengths in Fast End to End Training for Image Captioning"☆94Updated last year
- GRIT: Faster and Better Image-captioning Transformer (ECCV 2022)☆198Updated 2 years ago
- code for studying OpenAI's CLIP explainability☆38Updated 4 years ago
- Code for Harnessing the Power of MLLMs for Transferable Text-to-Image Person ReID (CVPR 2024)☆85Updated last year
- Using LSTM or Transformer to solve Image Captioning in Pytorch☆79Updated 4 years ago
- ICCV 2023: CLIPN for Zero-Shot OOD Detection: Teaching CLIP to Say No☆141Updated 2 years ago
- Implementation of our paper, 'Unifying Two-Stream Encoders with Transformers for Cross-Modal Retrieval.'☆28Updated 2 years ago
- Pytorch implementation of VQA: Visual Question Answering (https://arxiv.org/pdf/1505.00468.pdf) using VQA v2.0 dataset for open-ended ta…☆21Updated 5 years ago
- [TMM 2023] Self-paced Curriculum Adapting of CLIP for Visual Grounding.☆132Updated 3 months ago
- In this work, we implement different cross-modal learning schemes such as Siamese Network, Correlational Network and Deep Cross-Modal Pro…☆11Updated 4 years ago
- Cross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image Person Retrieval (CVPR 2023)☆268Updated 10 months ago
- SimVLM ---SIMPLE VISUAL LANGUAGE MODEL PRETRAINING WITH WEAK SUPERVISION☆36Updated 3 years ago
- [ACM MM23] CLIP-Count: Towards Text-Guided Zero-Shot Object Counting☆123Updated last year
- ☆53Updated 2 years ago
- Source code of our AAAI 2024 paper "Cross-Modal and Uni-Modal Soft-Label Alignment for Image-Text Retrieval"☆55Updated last year
- [NeurIPS 2022] Official repository of paper titled "Bridging the Gap between Object and Image-level Representations for Open-Vocabulary …☆297Updated 3 years ago
- A new framework for open-vocabulary object detection, based on maskrcnn-benchmark☆248Updated 3 years ago
- NoLA Codebase☆28Updated 9 months ago
- Implementation for "DualCoOp: Fast Adaptation to Multi-Label Recognition with Limited Annotations" (NeurIPS 2022))☆71Updated 2 years ago
- Implementation of the paper CPTR : FULL TRANSFORMER NETWORK FOR IMAGE CAPTIONING☆31Updated 3 years ago
- This repo contains the code of "Contrastive Supervised Distillation for Continual Representation Learning", Tommaso Barletti, Niccolò Bio…☆20Updated 3 years ago
- A curated list of Computer Vision related conferences with dates and paper registration deadlines.☆47Updated 3 months ago
- The official implementation for BLIP4CIR with bi-directional training | Bi-directional Training for Composed Image Retrieval via Text Pro…☆34Updated 2 years ago