☆72Jul 17, 2024Updated last year
Alternatives and similar repositories for QA-ViT
Users that are interested in QA-ViT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆24Apr 29, 2025Updated last year
- ☆13May 21, 2024Updated 2 years ago
- Official repository for "Boosting Audio Visual Question Answering via Key Semantic-Aware Cues" in ACM MM 2024.☆16Oct 25, 2024Updated last year
- FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions☆55Apr 17, 2024Updated 2 years ago
- (ICCV 2025) Enhance CLIP and MLLM's fine-grained visual representations with generative models.☆78Jun 25, 2025Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Multimodal Semi-Supervised Learning for Text Recognition (SemiMTR)☆83Sep 12, 2023Updated 2 years ago
- ☆15May 10, 2026Updated last month
- Ada-LISTA: Learned Solvers Adaptive to Varying Models☆11Feb 18, 2020Updated 6 years ago
- EMMA [TMLR 2025]☆14Sep 25, 2025Updated 9 months ago
- Code for paper DMCVR: Morphology-Guided Diffusion Model for 3D Cardiac Volume Reconstruction☆13Jan 12, 2024Updated 2 years ago
- ViCToR: Improving Visual Comprehension via Token Reconstruction for Pretraining LMMs☆29Aug 15, 2025Updated 10 months ago
- STVQA and TextVQA OCR results from Amazon Text in Image pipeline☆12Jul 18, 2022Updated 3 years ago
- ☆16Dec 25, 2021Updated 4 years ago
- [AAAI 24] Official Codebase for BridgeQA: Bridging the Gap between 2D and 3D Visual Question Answering: A Fusion Approach for 3D VQA☆27Jul 12, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆14May 3, 2022Updated 4 years ago
- ☆32Jul 29, 2024Updated last year
- TextAdaIN: Paying Attention to Shortcut Learning in Text Recognizers☆21Jul 26, 2022Updated 3 years ago
- code for paper "Accessing higher dimensions for unsupervised word translation"☆23Jun 26, 2023Updated 3 years ago
- MAVERICS (Manually-vAlidated Vq^2a Examples fRom Image-Caption datasetS) is a suite of test-only benchmarks for visual question answering…☆13Feb 18, 2023Updated 3 years ago
- Official Repository for "Learning Trimodal Relation for Audio-Visual Question Answering with Missing Modality" (ECCV 2024)☆16Oct 29, 2024Updated last year
- Code for paper: "Privately generating tabular data using language models".☆16Jun 13, 2023Updated 3 years ago
- MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)☆327Jan 20, 2025Updated last year
- ☆22Jun 5, 2025Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Some papers about *diverse* image (a few videos) captioning