☆71Jul 17, 2024Updated last year
Alternatives and similar repositories for QA-ViT
Users that are interested in QA-ViT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official repository for "Boosting Audio Visual Question Answering via Key Semantic-Aware Cues" in ACM MM 2024.☆16Oct 25, 2024Updated last year
- Ada-LISTA: Learned Solvers Adaptive to Varying Models☆11Feb 18, 2020Updated 6 years ago
- EMMA [TMLR 2025]☆12Sep 25, 2025Updated 6 months ago
- ViCToR: Improving Visual Comprehension via Token Reconstruction for Pretraining LMMs☆29Aug 15, 2025Updated 7 months ago
- official repo for paper "[CLS] Token Tells Everything Needed for Training-free Efficient MLLMs"☆22Apr 23, 2025Updated 11 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- STVQA and TextVQA OCR results from Amazon Text in Image pipeline☆12Jul 18, 2022Updated 3 years ago
- Python Vasculature Biomarker toolbox API☆16Jan 1, 2026Updated 3 months ago
- [AAAI 24] Official Codebase for BridgeQA: Bridging the Gap between 2D and 3D Visual Question Answering: A Fusion Approach for 3D VQA☆27Jul 12, 2024Updated last year
- ☆14May 3, 2022Updated 3 years ago
- ☆32Jul 29, 2024Updated last year
- TextAdaIN: Paying Attention to Shortcut Learning in Text Recognizers☆21Jul 26, 2022Updated 3 years ago
- code for paper "Accessing higher dimensions for unsupervised word translation"☆22Jun 26, 2023Updated 2 years ago
- MAVERICS (Manually-vAlidated Vq^2a Examples fRom Image-Caption datasetS) is a suite of test-only benchmarks for visual question answering…☆13Feb 18, 2023Updated 3 years ago
- [CVPR'24 Highlight] Implementation of "Causal-CoG: A Causal-Effect Look at Context Generation for Boosting Multi-modal Language Models"☆17Sep 12, 2024Updated last year
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Official Repository for "Learning Trimodal Relation for Audio-Visual Question Answering with Missing Modality" (ECCV 2024)☆16Oct 29, 2024Updated last year
- MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)☆324Jan 20, 2025Updated last year
- ☆17Aug 11, 2023Updated 2 years ago
- "Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs" 2023☆16Nov 28, 2024Updated last year
- Code for 'Robust Federated Learning with Noisy Labels'☆15Jun 28, 2021Updated 4 years ago
- Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal Models for Video Question Answering [ACM MM'24]☆10Jul 22, 2024Updated last year
- Code for Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models☆92Jun 28, 2024Updated last year
- Question-Aware Gaussian Experts for Audio-Visual Question Answering -- Official Pytorch Implementation (CVPR'25, Highlight)☆28Jun 6, 2025Updated 10 months ago
- ☆30Jul 25, 2025Updated 8 months ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- Unified Audio-Visual Perception for Multi-Task Video Localization☆31Apr 19, 2024Updated last year
- 🔥 [NeurIPS 2025] Official implementation of "Generate, but Verify: Reducing Visual Hallucination in Vision-Language Models with Retrospe…☆56Jan 22, 2026Updated 2 months ago
- ☆28Jul 18, 2025Updated 8 months ago
- Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types☆32Jul 16, 2025Updated 8 months ago
- The repo for "On-the-fly Modulation for Balanced Multimodal Learning", T-PAMI 2024☆19Sep 29, 2024Updated last year
- Experiments for "A Closer Look at In-Context Learning under Distribution Shifts"☆19May 29, 2023Updated 2 years ago
- LongMIL for WSI analysis with extrapolation positional embedding☆14Dec 11, 2024Updated last year
- [CVPR 2025] COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training☆40Mar 27, 2025Updated last year
- Can 3D Vision-Language Models Truly Understand Natural Language?☆20Mar 28, 2024Updated 2 years ago
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- ☆21Mar 12, 2025Updated last year
- ☆11May 24, 2024Updated last year
- ☆23Apr 19, 2024Updated last year
- ☆23Sep 28, 2023Updated 2 years ago
- Official Repository for CVPR 2022 paper "REX: Reasoning-aware and Grounded Explanation"☆22Nov 21, 2023Updated 2 years ago
- CycleReward is a reward model trained on cycle consistency preferences to measure image-text alignment.☆55Nov 3, 2025Updated 5 months ago
- Code for Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense? [COLM 2024]☆24Aug 13, 2024Updated last year