Using image captions with LLM for zero-shot VQA
☆18Mar 14, 2024Updated last year
Alternatives and similar repositories for captions-in-VQA
Users that are interested in captions-in-VQA are comparing it to the libraries listed below
Sorting:
- Local self-attention in Transformer for visual question answering☆13Mar 17, 2024Updated last year
- ☆18May 31, 2023Updated 2 years ago
- [ECCV2022] Rethinking Data Augmentation for Robust Visual Question Answering☆13Nov 23, 2022Updated 3 years ago
- Official Implementation for CVPR 2023 paper "Divide and Conquer: Answering Questions with Object Factorization and Compositional Reasonin…☆10Jun 16, 2024Updated last year
- [CVPR 23] Q: How to Specialize Large Vision-Language Models to Data-Scarce VQA Tasks? A: Self-Train on Unlabeled Images!☆17May 14, 2024Updated last year
- ☆14May 10, 2021Updated 4 years ago
- Implementation of ConceptBert: Concept-Aware Representation for Visual Question Answering☆31Apr 30, 2024Updated last year
- official implementation of "CLIP-VQDiffusion : Langauge Free Training of Text To Image generation using CLIP and vector quantized diffusi…☆18Sep 5, 2024Updated last year
- Official implementation for the MM'22 paper.☆14Jun 30, 2022Updated 3 years ago
- [CVPR 2024] How to Configure Good In-Context Sequence for Visual Question Answering☆21May 28, 2025Updated 9 months ago
- The code of IJCAI2022 paper, Declaration-based Prompt Tuning for Visual Question Answering☆20May 10, 2022Updated 3 years ago
- Coarse-to-Fine Reasoning for Visual Question Answering (CVPRW'22)☆48Nov 3, 2022Updated 3 years ago
- [ICCV 2021] Official implementation of the paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"☆69Oct 11, 2021Updated 4 years ago
- visual question answering prompting recipes for large vision-language models☆28Sep 14, 2024Updated last year
- Repo for the EMNLP 2023 paper "A Simple Knowledge-Based Visual Question Answering"☆25Dec 14, 2023Updated 2 years ago
- Vecna is a Python chatbot which recommends songs and movies depending upon your feelings☆12Jun 28, 2022Updated 3 years ago
- ☆27Oct 7, 2021Updated 4 years ago
- Official implementation of the paper "ALADIN: Distilling Fine-grained Alignment Scores for Efficient Image-Text Matching and Retrieval"☆28Dec 6, 2023Updated 2 years ago
- Generative Bias for Robust Visual Question Answering ( CVPR 2023 )☆28Jul 4, 2023Updated 2 years ago
- [TIP2023] The code of “Plug-and-Play Regulators for Image-Text Matching”☆34Apr 11, 2024Updated last year
- [ICRA 2024] WLST: Weak Labels Guided Self-training for Weakly-supervised Domain Adaptation on 3D Object Detection☆12Feb 6, 2024Updated 2 years ago
- [ICLR 2025] Official PyTorch Implementation for CPE: Concept Pinpoint Eraser for Text-to-image Diffusion Models via Residual Attention Ga…☆12Apr 7, 2025Updated 11 months ago
- [Paper][ISWC 2021] Zero-shot Visual Question Answering using Knowledge Graph☆72Feb 9, 2024Updated 2 years ago
- Evaluation benchmark for the task of Semantic Image Translation. Contains code to run FlexIT (CVPR 2022)☆34Mar 25, 2022Updated 3 years ago
- ☆30Dec 16, 2022Updated 3 years ago
- This repository is the code of paper "Multi-level Metric Learning for Few-shot Image Recognition".(ICANN-2022))☆34Dec 13, 2022Updated 3 years ago
- [ICML 2024] CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers☆34Dec 30, 2024Updated last year
- Position Focused Attention Network for Image-Text Matching☆69Aug 20, 2019Updated 6 years ago
- [CVPR 2024] Official Code for the Paper "Compositional Chain-of-Thought Prompting for Large Multimodal Models"☆145Jun 20, 2024Updated last year
- [AAAI2023] Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task (Oral)☆41Mar 23, 2024Updated last year
- Extract information from XBRL files in the ESEF format☆13Jan 3, 2026Updated 2 months ago
- Belief Revision based Caption Re-ranker with Visual Semantic Information. COLING 2022☆11Apr 13, 2025Updated 10 months ago
- This project predicts wind turbine failure using numerous sensor data by applying classification based ML models that improves prediction…☆11Mar 20, 2023Updated 2 years ago
- Official Repo For AAAI 2026 Accepted Paper "Rethinking the Spatio-Temporal Alignment of End-to-End 3D Perception"☆29Jan 13, 2026Updated last month
- [SIGIR 2025] Official impl. of "MRAMG-Bench: A Comprehensive Benchmark for Advancing Multimodal Retrieval-Augmented Multimodal Generation…☆17Apr 15, 2025Updated 10 months ago
- Implementation of the CVPR2025 paper LoTUS: Large-Scale Machine Unlearning with a Taste of Uncertainty.☆17Sep 10, 2025Updated 5 months ago
- ☆10Jul 29, 2022Updated 3 years ago
- We archive data because we are interested in the diffs. All data is from https://video-api.cartoonnetwork.com. We run the check every min…☆10Updated this week
- Official Code of IdealGPT☆35Updated this week