yousefkotp / Visual-Question-Answering
A Light weight deep learning model with with a web application to answer image-based questions with a non-generative approach for the VizWiz grand challenge 2023 by carefully curating the answer vocabulary and adding linear layer on top of Open AI's CLIP model as image and text encoder
☆11Updated last year
Alternatives and similar repositories for Visual-Question-Answering:
Users that are interested in Visual-Question-Answering are comparing it to the libraries listed below
- 【AAAI 2024】An Empirical Study of CLIP for Text-based Person Search☆63Updated last year
- ☆10Updated last year
- Archive of Tasks and Results of the Video Browser Showdown☆11Updated last month
- [ACM TOMM 2023] - Composed Image Retrieval using Contrastive Learning and Task-oriented CLIP-based Features☆177Updated last year
- Pytorch implementation of VQA: Visual Question Answering (https://arxiv.org/pdf/1505.00468.pdf) using VQA v2.0 dataset for open-ended ta…☆19Updated 4 years ago
- GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection (AAAI 2024)☆66Updated last year
- [WACV 2025] Official code for our paper "Enhancing Novel Object Detection via Cooperative Foundational Models"☆74Updated last month
- The official code of "Towards Unified Text-based Person Retrieval: A Large-scale Multi-Attribute and Language Search Benchmark"☆154Updated 8 months ago
- ☆11Updated last year
- CLIP4IDC: CLIP for Image Difference Captioning (AACL 2022)☆34Updated 2 years ago
- Implementation of LaTr: Layout-aware transformer for scene-text VQA,a novel multimodal architecture for Scene Text Visual Question Answer…☆53Updated 5 months ago
- PyTorch implementation of ICML 2023 paper "SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation"☆90Updated last year
- SimVLM ---SIMPLE VISUAL LANGUAGE MODEL PRETRAINING WITH WEAK SUPERVISION☆36Updated 2 years ago
- Few-shot Object Counting and Detection (ECCV 2022)☆70Updated 5 months ago
- Implementation code of the work "Exploiting Multiple Sequence Lengths in Fast End to End Training for Image Captioning"☆91Updated 4 months ago
- [TIP2023] The code of “Plug-and-Play Regulators for Image-Text Matching”☆33Updated last year
- This is implementation of finetuning BLIP model for Visual Question Answering☆65Updated last year
- Exploring multimodal fusion-type transformer models for visual question answering (on DAQUAR dataset)☆34Updated 3 years ago
- Word4Per is an innovative framework for Zero-Shot Composed Person Retrieval (ZS-CPR), integrating visual and textual information for enha…☆22Updated 4 months ago
- SeqTR: A Simple yet Universal Network for Visual Grounding☆134Updated 5 months ago
- A paper list of image captioning.☆22Updated 3 years ago
- ☆29Updated last year
- [CVPR2024 Highlight] Official repository of the paper "The devil is in the fine-grained details: Evaluating open-vocabulary object detect…☆53Updated 3 weeks ago
- WACV 2022 Paper - Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching☆16Updated 3 years ago
- [SIGIR 2024] - Simple but Effective Raw-Data Level Multimodal Fusion for Composed Image Retrieval☆34Updated 9 months ago
- Code for Harnessing the Power of MLLMs for Transferable Text-to-Image Person ReID (CVPR 2024)☆64Updated 9 months ago
- A DETR-style framework for open-vocabulary detection (OVD). CVPR 2023☆190Updated 2 years ago
- 【ICLR 2024, Spotlight】Sentence-level Prompts Benefit Composed Image Retrieval☆81Updated last year
- Code for ECCV 2022 Workshop paper "See Finer, See More: Implicit Modality Alignment for Text-based Person Retrieval"☆21Updated last year
- Towards Local Visual Modeling for Image Captioning☆28Updated 2 years ago