yousefkotp / Visual-Question-AnsweringLinks
A Light weight deep learning model with with a web application to answer image-based questions with a non-generative approach for the VizWiz grand challenge 2023 by carefully curating the answer vocabulary and adding linear layer on top of Open AI's CLIP model as image and text encoder
☆14Updated 2 years ago
Alternatives and similar repositories for Visual-Question-Answering
Users that are interested in Visual-Question-Answering are comparing it to the libraries listed below
Sorting:
- Pytorch implementation of image captioning using transformer-based model.☆68Updated 2 years ago
- [ECCV'22] Official repository of paper titled "Class-agnostic Object Detection with Multi-modal Transformer".☆315Updated 2 years ago
- This is implementation of finetuning BLIP model for Visual Question Answering☆83Updated 2 years ago
- [ACM TOMM 2023] - Composed Image Retrieval using Contrastive Learning and Task-oriented CLIP-based Features☆189Updated 2 years ago
- Implementation code of the work "Exploiting Multiple Sequence Lengths in Fast End to End Training for Image Captioning"☆94Updated last year
- [TMM 2023] Self-paced Curriculum Adapting of CLIP for Visual Grounding.☆132Updated 2 months ago
- GRIT: Faster and Better Image-captioning Transformer (ECCV 2022)☆197Updated 2 years ago
- Pytorch implementation of VQA: Visual Question Answering (https://arxiv.org/pdf/1505.00468.pdf) using VQA v2.0 dataset for open-ended ta…☆21Updated 5 years ago
- Hyperparameter analysis for Image Captioning using LSTMs and Transformers☆26Updated 2 years ago
- 【AAAI 2024】An Empirical Study of CLIP for Text-based Person Search☆73Updated last year
- ☆34Updated 3 years ago
- [NeurIPS 2022] Official repository of paper titled "Bridging the Gap between Object and Image-level Representations for Open-Vocabulary …☆297Updated 3 years ago
- Source code of our AAAI 2024 paper "Cross-Modal and Uni-Modal Soft-Label Alignment for Image-Text Retrieval"☆53Updated last year
- The repository collects many various multi-modal transformer architectures, including image transformer, video transformer, image-languag…☆233Updated 3 years ago
- [CVPR 2023] Official repository of paper titled "Fine-tuned CLIP models are efficient video learners".☆301Updated last year
- [CVPR 2022] Official code for "Unified Contrastive Learning in Image-Text-Label Space"☆405Updated 2 years ago
- Using LSTM or Transformer to solve Image Captioning in Pytorch☆79Updated 4 years ago
- An easy to use, user-friendly and efficient code for extracting OpenAI CLIP (Global/Grid) features from image and text respectively.☆136Updated last year
- Cross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image Person Retrieval (CVPR 2023)☆263Updated 9 months ago
- SimVLM ---SIMPLE VISUAL LANGUAGE MODEL PRETRAINING WITH WEAK SUPERVISION☆36Updated 3 years ago
- Contextual Object Detection with Multimodal Large Language Models☆256Updated last year
- natual language guided image captioning☆87Updated last year
- code for studying OpenAI's CLIP explainability☆37Updated 4 years ago
- ☆52Updated 2 years ago
- An unofficial implementation of TubeViT in "Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning"☆93Updated last year
- Few-shot Object Counting and Detection (ECCV 2022)☆81Updated last year
- Official Code for Composite Sketch+Text Queries for Retrieving Objects with Elusive Names and Complex Interactions☆16Updated 2 years ago
- SmallCap: Lightweight Image Captioning Prompted with Retrieval Augmentation☆126Updated last year
- Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning, CVPR 2022☆96Updated 3 years ago
- object detection based on owl-vit☆67Updated 2 years ago