UMass-Foundation-Model / VisualCoT

Codebase for AAAI 2024 conference paper Visual Chain-of-Thought Prompting for Knowledge-based Visual Reasoning

☆25

Alternatives and similar repositories for VisualCoT:

Users that are interested in VisualCoT are comparing it to the libraries listed below

limanling / KnowledgeVL-Reading
☆67Updated last year
allenai / aokvqa
Official repository for the A-OKVQA dataset
☆75Updated 9 months ago
pkunlp-icler / MIC
MMICL, a state-of-the-art VLM with the in context learning ability from ICL, PKU
☆46Updated last year
jialinwu17 / MAVEX
☆28Updated 2 years ago
guoyang9 / UnifER
Official implementation for the MM'22 paper.
☆12Updated 2 years ago
szzexpoi / rex
Official Repository for CVPR 2022 paper "REX: Reasoning-aware and Grounded Explanation"
☆21Updated last year
vipulgupta1011 / swapmix
☆19Updated 2 years ago
LisaAnne / Hallucination
☆64Updated 5 years ago
bcmi / Causal-VidQA
[CVPR 2022] A large-scale public benchmark dataset for video question-answering, especially about evidence and commonsense reasoning. The…
☆53Updated 7 months ago
guilk / KAT
Research code for "KAT: A Knowledge Augmented Transformer for Vision-and-Language"
☆63Updated 2 years ago
aditya10 / VLC-BERT
Code for WACV 2023 paper "VLC-BERT: Visual Question Answering with Contextualized Commonsense Knowledge"
☆21Updated last year
open-vision-language / oven
☆32Updated last year
open-vision-language / infoseek
☆39Updated last year
jingchenchen / ReasoningConsistency-VQA
☆12Updated 2 years ago
Hxyou / IdealGPT
Official Code of IdealGPT
☆34Updated last year
VT-NLP / MultiInstruct
MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning
☆135Updated last year
zjuchenlong / WSAG
[EMNLP'22] Weakly-Supervised Temporal Article Grounding
☆14Updated last year
X-PLUG / mPLUG-HalOwl
mPLUG-HalOwl: Multimodal Hallucination Evaluation and Mitigating
☆89Updated last year
thunlp / PEVL
Source code for EMNLP 2022 paper “PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language Models”
☆48Updated 2 years ago
szzexpoi / POEM
Official Implementation for CVPR 2023 paper "Divide and Conquer: Answering Questions with Object Factorization and Compositional Reasonin…
☆10Updated 8 months ago
YiyangZhou / LURE
[ICLR 2024] Analyzing and Mitigating Object Hallucination in Large Vision-Language Models
☆141Updated 9 months ago
LouChao98 / VLGAE
Official Implementation for CVPR 2022 paper "Unsupervised Vision-Language Parsing: Seamlessly Bridging Visual Scene Graphs with Language …
☆23Updated 2 years ago
PVIT-official / PVIT
Repository of paper: Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models
☆37Updated last year
HenryHZY / VL-PET
[ICCV2023] Official code for "VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control"
☆53Updated last year
edchengg / oven_eval
ICCV 2023 (Oral) Open-domain Visual Entity Recognition Towards Recognizing Millions of Wikipedia Entities
☆37Updated 5 months ago
UMass-Foundation-Model / CoVLM
Official implementation for CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding
☆44Updated last year
Zhiquan-Wen / D-VQA
PyTorch implementation of "Debiased Visual Question Answering from Feature and Sample Perspectives" (NeurIPS 2021)
☆24Updated 2 years ago
AndersonStra / MuKEA
MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-based Visual Question Answering
☆93Updated last year
SooLab / DDCOT
[NeurIPS 2023]DDCoT: Duty-Distinct Chain-of-Thought Prompting for Multimodal Reasoning in Language Models
☆38Updated 11 months ago
bcdnlp / FAITHSCORE
☆28Updated 3 months ago