shenxiang-vqa/LSAT

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/shenxiang-vqa/LSAT)

shenxiang-vqa / LSAT

Local self-attention in Transformer for visual question answering

☆13

Alternatives and similar repositories for LSAT

Users that are interested in LSAT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

keep-smile-001 / opentqa
View on GitHub
opentqa is a open framework of the textbook question answering, which includes xtqa, mcan, cmr, mfb, mutan.
☆11Mar 27, 2021Updated 5 years ago
rentainhe / TRAR-VQA
View on GitHub
[ICCV 2021] Official implementation of the paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering"
☆68Oct 11, 2021Updated 4 years ago
val-iisc / RMLVQA
View on GitHub
☆19May 31, 2023Updated 3 years ago
ovguyo / captions-in-VQA
View on GitHub
Using image captions with LLM for zero-shot VQA
☆19Mar 14, 2024Updated 2 years ago
alexandrosXe / A-Simple-Baseline-For-Knowledge-Based-VQA
View on GitHub
Repo for the EMNLP 2023 paper "A Simple Knowledge-Based Visual Question Answering"
☆25Dec 14, 2023Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
xiaojino / RUArt
View on GitHub
RUArt: A Novel Text-Centered Solution for Text-Based Visual Question Answering
☆10Nov 27, 2022Updated 3 years ago
prdwb / okvqa-release
View on GitHub
☆15May 10, 2021Updated 5 years ago
expertailab / ISAAQ
View on GitHub
☆10Oct 1, 2020Updated 5 years ago
TIAN-viola / DynRT
View on GitHub
Official implementation of Dynamic Routing Transformer Network for Multimodal Sarcasm Detection (ACL'23)
☆35Jul 9, 2023Updated 3 years ago
alirezasalemi7 / DEDR-MM-FiD
View on GitHub
the code for paper: A Symmetric Dual Encoding Dense Retrieval Framework for Knowledge-Intensive Visual Question Answering
☆14Aug 22, 2023Updated 2 years ago
983632847 / All-in-One
View on GitHub
All in One: Exploring Unified Vision-Language Tracking with Multi-Modal Alignment
☆21Feb 11, 2025Updated last year
downdric / MSD
View on GitHub
The official implementation of the paper "DIP: Dual Incongruity Perceiving Network for Sarcasm Detection"
☆36Dec 6, 2024Updated last year
szzexpoi / POEM
View on GitHub
Official Implementation for CVPR 2023 paper "Divide and Conquer: Answering Questions with Object Factorization and Compositional Reasonin…
☆10Jun 16, 2024Updated 2 years ago
CR-Gjx / Img2Prompt
View on GitHub
Evaluation codes of "From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models".
☆17May 15, 2023Updated 3 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
zchoi / SPT
View on GitHub
[TCSVT23] Official code for "SPT: Spatial Pyramid Transformer for Image Captioning".
☆10Aug 14, 2024Updated last year
LCS2-IIITD / MSH-COMICS
View on GitHub
Multi-modal Sarcasm Detection and Humor Classification in Code-mixed Conversations
☆13May 31, 2021Updated 5 years ago
PhoebusSi / MMBS
View on GitHub
Code for our EMNLP-2022 paper: "Towards Robust Visual Question Answering: Making the Most of Biased Samples via Contrastive Learning"
☆16Feb 22, 2023Updated 3 years ago
guoyang9 / UnifER
View on GitHub
Official implementation for the MM'22 paper.
☆14Jun 30, 2022Updated 4 years ago
GeWu-Lab / MWAFM
View on GitHub
Multi-Scale Attention for Audio Question Answering
☆28Jul 19, 2023Updated 3 years ago
AlonMendelson / SGVL
View on GitHub
☆17Dec 13, 2023Updated 2 years ago
jingjing12110 / MixPHM
View on GitHub
[CVPR 2023] Pytorch Code of MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question Answering
☆17Jul 11, 2023Updated 3 years ago
mshukor / EvALign-ICL
View on GitHub
[ICLR2024] (EvALign-ICL Benchmark) Beyond Task Performance: Evaluating and Reducing the Flaws of Large Multimodal Models with In-Context …
☆22Mar 1, 2024Updated 2 years ago
CCIIPLab / DPT
View on GitHub
The code of IJCAI2022 paper, Declaration-based Prompt Tuning for Visual Question Answering
☆20May 10, 2022Updated 4 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
GaochangWu / FMF-Benchmark
View on GitHub
This is a cross-modal benchmark for industrial anomaly detection.
☆27Jun 8, 2026Updated last month
renfei / SpringCloudDemo
View on GitHub
SpringCloud微服务入门教程，包含Eureka注册发现、Config配置中心、BUS消息总线、FeignClient客户端、Zuul网关、Hystrix服务熔断降级、Stream消息队列、Sleuth链路监控、Swagger文档的基本整合演示。
☆11Aug 26, 2024Updated last year
ronghanghu / gqa_single_hop_baseline
View on GitHub
A simple but well-performing "single-hop" visual attention model for the GQA dataset
☆20Aug 8, 2019Updated 6 years ago
chengtan9907 / mc-cot
View on GitHub
The official implementation of the ECCV'24 paper MC-CoT: Boosting the Power of Small Multimodal Reasoning Models to Match Larger Models w…
☆26May 19, 2024Updated 2 years ago
facebookresearch / selective-vqa_ood
View on GitHub
Implementation for the CVPR 2023 paper "Improving Selective Visual Question Answering by Learning from Your Peers" (https://arxiv.org/abs…
☆26Jul 20, 2023Updated 3 years ago
GaryJiajia / OFv2_ICL_VQA
View on GitHub
[CVPR 2024] How to Configure Good In-Context Sequence for Visual Question Answering
☆21May 28, 2025Updated last year
Tecmus / BasicNLP
View on GitHub
HMM(隐马尔科夫)模型实现词性标注和分词
☆10Sep 28, 2017Updated 8 years ago
rabiulcste / vqazero
View on GitHub
visual question answering prompting recipes for large vision-language models
☆29Sep 14, 2024Updated last year
minghangz / SPL
View on GitHub
Generating Structured Pseudo Labels for Noise-resistant Zero-shot Video Sentence Localization
☆16Jul 20, 2023Updated 3 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
ForJadeForest / LIVE-Learnable-In-Context-Vector
View on GitHub
【NeurIPS 2024】The implementation of LIVE: Learnable In-Context Vector for Visual Question Answering https://arxiv.org/abs/2406.13185
☆23May 31, 2025Updated last year
yousefkotp / Visual-Question-Answering
View on GitHub
A Light weight deep learning model with with a web application to answer image-based questions with a non-generative approach for the Viz…
☆15Jun 27, 2023Updated 3 years ago
rajatkoner08 / Graphhopper
View on GitHub
This is a code repository of Graphhopper: Multi-Hop Scene GraphReasoning for Visual Question Answering
☆19Oct 30, 2021Updated 4 years ago
biboamy / music-repro
View on GitHub
☆17Nov 7, 2023Updated 2 years ago
aurooj / WeakGroundedVQA_Capsules
View on GitHub
☆18Apr 10, 2023Updated 3 years ago
Raymond-sci / EMB
View on GitHub
Pytorch Implementation of ECCV'22 paper: Video Activity Localisation with Uncertainties in Temporal Boundary
☆17Jul 17, 2022Updated 4 years ago
sheng-n / lncRNA-disease-methods
View on GitHub
☆26Jan 13, 2024Updated 2 years ago