UMass-Embodied-AGI/VisualCoT

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/UMass-Embodied-AGI/VisualCoT)

UMass-Embodied-AGI / VisualCoT

Codebase for AAAI 2024 conference paper Visual Chain-of-Thought Prompting for Knowledge-based Visual Reasoning

☆40

Alternatives and similar repositories for VisualCoT

Users that are interested in VisualCoT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

UMass-Embodied-AGI / genome
View on GitHub
☆16Apr 10, 2025Updated last year
szzexpoi / POEM
View on GitHub
Official Implementation for CVPR 2023 paper "Divide and Conquer: Answering Questions with Object Factorization and Compositional Reasonin…
☆10Jun 16, 2024Updated 2 years ago
jialinwu17 / MAVEX
View on GitHub
☆30Dec 16, 2022Updated 3 years ago
luomancs / retriever_reader_for_okvqa
View on GitHub
☆19Dec 8, 2022Updated 3 years ago
guoyang9 / UnifER
View on GitHub
Official implementation for the MM'22 paper.
☆14Jun 30, 2022Updated 4 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
ovguyo / captions-in-VQA
View on GitHub
Using image captions with LLM for zero-shot VQA
☆19Mar 14, 2024Updated 2 years ago
Hxyou / IdealGPT
View on GitHub
Official Code of IdealGPT
☆39Mar 3, 2026Updated 4 months ago
LouChao98 / VLGAE
View on GitHub
Official Implementation for CVPR 2022 paper "Unsupervised Vision-Language Parsing: Seamlessly Bridging Visual Scene Graphs with Language …
☆24Oct 19, 2022Updated 3 years ago
wenhuchen / Meta-Module-Network
View on GitHub
Code for WACV 2021 Paper "Meta Module Network for Compositional Visual Reasoning"
☆43May 13, 2021Updated 5 years ago
microsoft / PICa
View on GitHub
An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA, AAAI 2022 (Oral)
☆88Apr 10, 2022Updated 4 years ago
SooLab / DDCOT
View on GitHub
[NeurIPS 2023]DDCoT: Duty-Distinct Chain-of-Thought Prompting for Multimodal Reasoning in Language Models
☆48Mar 18, 2024Updated 2 years ago
SpencerWhitehead / novelvqa
View on GitHub
☆27Oct 7, 2021Updated 4 years ago
yuleiniu / introd
View on GitHub
[NeurIPS 2021] Introspective Distillation for Robust Question Answering
☆13Dec 7, 2021Updated 4 years ago
GaryJiajia / OFv2_ICL_VQA
View on GitHub
[CVPR 2024] How to Configure Good In-Context Sequence for Visual Question Answering
☆21May 28, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
southnx / ACoLP
View on GitHub
Open Set Video HOI detection from Action-centric Chain-of-Look Prompting, ICCV2023
☆12Oct 3, 2023Updated 2 years ago
rabiulcste / vqazero
View on GitHub
visual question answering prompting recipes for large vision-language models
☆29Sep 14, 2024Updated last year
jiazheng-xing / SloshNet
View on GitHub
[AAAI2023] Revisiting the Spatial and Temporal Modeling for Few-shot Action Recognition (SloshNet)
☆14Jan 10, 2024Updated 2 years ago
hexiang-hu / answer_embedding
View on GitHub
Code Release for `Learning Answer Embeddings for Visual Question Answering`. (CVPR 2018)
☆13Apr 6, 2019Updated 7 years ago
heaplax / ARMAP
View on GitHub
☆29Jun 5, 2025Updated last year
shenxiang-vqa / LSAT
View on GitHub
Local self-attention in Transformer for visual question answering
☆13Mar 17, 2024Updated 2 years ago
tejas-gokhale / vqa_mutant
View on GitHub
☆13Feb 14, 2022Updated 4 years ago
salesforce / VD-BERT
View on GitHub
☆45Jun 16, 2025Updated last year
bowen-upenn / Multi-Agent-VQA
View on GitHub
[CVPR 2024 CVinW] Multi-Agent VQA: Exploring Multi-Agent Foundation Models on Zero-Shot Visual Question Answering
☆22Sep 21, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
ronilp / mac-network-pytorch-gqa
View on GitHub
Memory, Attention and Composition (MAC) Network for CLEVR/GQA implemented in PyTorch
☆27Aug 26, 2024Updated last year
HITsz-TMG / Cognitive-Visual-Language-Mapper
View on GitHub
The codes and datasets about our ACL 2024 Main Conference paper titled "Cognitive Visual-Language Mapper: Advancing Multimodal Comprehens…
☆17Jan 24, 2025Updated last year
keysg-lab / KeySG
View on GitHub
[ICRA2026] Official implementation of "KeySG: Hierarchical Keyframe-Based 3D Scene Graphs"
☆22Apr 30, 2026Updated 2 months ago
gqa-ood / GQA-OOD
View on GitHub
GQA-OOD is a new dataset and benchmark for the evaluation of VQA models in OOD (out of distribution) settings.
☆33Mar 1, 2021Updated 5 years ago
Yushi-Hu / PromptCap
View on GitHub
natual language guided image captioning
☆89Feb 11, 2024Updated 2 years ago
showlab / CLVQA
View on GitHub
[AAAI2023] Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task (Oral)
☆42Mar 23, 2024Updated 2 years ago
baoqianyue / DFC2021-Track-MSD
View on GitHub
Third place of 2021 IEEE GRSS Data Fusion Contest: Track MSD
☆10Mar 31, 2021Updated 5 years ago
chancharikmitra / CCoT
View on GitHub
[CVPR 2024] Official Code for the Paper "Compositional Chain-of-Thought Prompting for Large Multimodal Models"
☆142Jun 20, 2024Updated 2 years ago
maximek3 / MIMIC-NLE
View on GitHub
☆21Jul 25, 2022Updated 4 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
ThreeSR / Good-Learning-Resources
View on GitHub
☆12Oct 5, 2022Updated 3 years ago
thomaswei-cn / MC-CoT
View on GitHub
MC-CoT implementation code
☆23Jun 24, 2025Updated last year
aioz-ai / CFR_VQA
View on GitHub
Coarse-to-Fine Reasoning for Visual Question Answering (CVPRW'22)
☆49Apr 22, 2026Updated 3 months ago
PhoebusSi / Thinking-while-Observing
View on GitHub
Code for our ACL-2023 paper: "Combo of Thinking and Observing for Outside-Knowledge VQA"
☆12Jun 30, 2023Updated 3 years ago
beacon-3d / Beacon3D
View on GitHub
[CVPR 2025] Beacon3D: Object-centric Evaluation for 3D Grounding-QA
☆28Nov 25, 2025Updated 8 months ago
yashkant / concat-vqa
View on GitHub
Official code for the paper "Contrast and Classify: Training Robust VQA Models" published at ICCV, 2021
☆19Jul 27, 2021Updated 4 years ago
SijieSong / CVPR21-Cogrounding_semantic_attention
View on GitHub
☆14Jul 13, 2021Updated 5 years ago