ovguyo / captions-in-VQAView external linksLinks
Using image captions with LLM for zero-shot VQA
☆18Mar 14, 2024Updated last year
Alternatives and similar repositories for captions-in-VQA
Users that are interested in captions-in-VQA are comparing it to the libraries listed below
Sorting:
- Local self-attention in Transformer for visual question answering☆13Mar 17, 2024Updated last year
- ☆18May 31, 2023Updated 2 years ago
- Official Implementation for CVPR 2023 paper "Divide and Conquer: Answering Questions with Object Factorization and Compositional Reasonin…☆10Jun 16, 2024Updated last year
- [ECCV2022] Rethinking Data Augmentation for Robust Visual Question Answering☆13Nov 23, 2022Updated 3 years ago
- ☆14May 10, 2021Updated 4 years ago
- [CVPR 23] Q: How to Specialize Large Vision-Language Models to Data-Scarce VQA Tasks? A: Self-Train on Unlabeled Images!☆17May 14, 2024Updated last year
- Implementation of ConceptBert: Concept-Aware Representation for Visual Question Answering☆31Apr 30, 2024Updated last year
- official implementation of "CLIP-VQDiffusion : Langauge Free Training of Text To Image generation using CLIP and vector quantized diffusi…☆18Sep 5, 2024Updated last year
- Official implementation for the MM'22 paper.☆14Jun 30, 2022Updated 3 years ago
- [CVPR 2024] How to Configure Good In-Context Sequence for Visual Question Answering☆21May 28, 2025Updated 8 months ago
- The code of IJCAI2022 paper, Declaration-based Prompt Tuning for Visual Question Answering☆20May 10, 2022Updated 3 years ago
- Coarse-to-Fine Reasoning for Visual Question Answering (CVPRW'22)☆48Nov 3, 2022Updated 3 years ago
- [ICCV 2021] Official implementation of the paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"☆69Oct 11, 2021Updated 4 years ago
- Repo for the EMNLP 2023 paper "A Simple Knowledge-Based Visual Question Answering"☆25Dec 14, 2023Updated 2 years ago
- visual question answering prompting recipes for large vision-language models☆28Sep 14, 2024Updated last year
- ☆27Oct 7, 2021Updated 4 years ago
- Official implementation of the paper "ALADIN: Distilling Fine-grained Alignment Scores for Efficient Image-Text Matching and Retrieval"☆27Dec 6, 2023Updated 2 years ago
- Vecna is a Python chatbot which recommends songs and movies depending upon your feelings☆11Jun 28, 2022Updated 3 years ago
- Generative Bias for Robust Visual Question Answering ( CVPR 2023 )☆28Jul 4, 2023Updated 2 years ago
- [TIP2023] The code of “Plug-and-Play Regulators for Image-Text Matching”☆34Apr 11, 2024Updated last year
- [ICLR 2025] Official PyTorch Implementation for CPE: Concept Pinpoint Eraser for Text-to-image Diffusion Models via Residual Attention Ga…☆12Apr 7, 2025Updated 10 months ago
- [Paper][ISWC 2021] Zero-shot Visual Question Answering using Knowledge Graph☆72Feb 9, 2024Updated 2 years ago
- [ICRA 2024] WLST: Weak Labels Guided Self-training for Weakly-supervised Domain Adaptation on 3D Object Detection☆12Feb 6, 2024Updated 2 years ago
- ☆30Dec 16, 2022Updated 3 years ago
- Evaluation benchmark for the task of Semantic Image Translation. Contains code to run FlexIT (CVPR 2022)☆34Mar 25, 2022Updated 3 years ago
- [ICML 2024] CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers☆34Dec 30, 2024Updated last year
- This repository is the code of paper "Multi-level Metric Learning for Few-shot Image Recognition".(ICANN-2022))☆34Dec 13, 2022Updated 3 years ago
- Position Focused Attention Network for Image-Text Matching☆69Aug 20, 2019Updated 6 years ago
- [AAAI2023] Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task (Oral)☆41Mar 23, 2024Updated last year
- [CVPR 2024] Official Code for the Paper "Compositional Chain-of-Thought Prompting for Large Multimodal Models"☆145Jun 20, 2024Updated last year
- Implementation of the CVPR2025 paper LoTUS: Large-Scale Machine Unlearning with a Taste of Uncertainty.☆16Sep 10, 2025Updated 5 months ago
- This project predicts wind turbine failure using numerous sensor data by applying classification based ML models that improves prediction…☆11Mar 20, 2023Updated 2 years ago
- Tally Prime MCP (Model Context Protocol) Server implementation to feed Tally ERP data to popular LLM like Claude, ChatGPT supporting MCP☆15Nov 11, 2025Updated 3 months ago
- We archive data because we are interested in the diffs. All data is from https://video-api.cartoonnetwork.com. We run the check every min…☆10Updated this week
- Extract information from XBRL files in the ESEF format☆13Jan 3, 2026Updated last month
- ☆10Jul 29, 2022Updated 3 years ago
- Belief Revision based Caption Re-ranker with Visual Semantic Information. COLING 2022☆11Apr 13, 2025Updated 10 months ago
- Fine-grained Image Editing by Pixel-wise Guidance Using Diffusion Models☆41Jun 9, 2023Updated 2 years ago
- [CVPR 2021] Smoothing the Disentangled Latent Style Space for Unsupervised I2I Translation☆42Mar 17, 2023Updated 2 years ago