☆71Jul 17, 2024Updated last year
Alternatives and similar repositories for QA-ViT
Users that are interested in QA-ViT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆24Apr 29, 2025Updated last year
- ☆13May 21, 2024Updated last year
- Official repository for "Boosting Audio Visual Question Answering via Key Semantic-Aware Cues" in ACM MM 2024.☆16Oct 25, 2024Updated last year
- FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions☆55Apr 17, 2024Updated 2 years ago
- EMMA [TMLR 2025]☆12Sep 25, 2025Updated 7 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- official repo for paper "[CLS] Token Tells Everything Needed for Training-free Efficient MLLMs"☆22Apr 23, 2025Updated last year
- STVQA and TextVQA OCR results from Amazon Text in Image pipeline☆12Jul 18, 2022Updated 3 years ago
- [AAAI 24] Official Codebase for BridgeQA: Bridging the Gap between 2D and 3D Visual Question Answering: A Fusion Approach for 3D VQA☆27Jul 12, 2024Updated last year
- ☆14May 3, 2022Updated 3 years ago
- ☆32Jul 29, 2024Updated last year
- TextAdaIN: Paying Attention to Shortcut Learning in Text Recognizers☆21Jul 26, 2022Updated 3 years ago
- MAVERICS (Manually-vAlidated Vq^2a Examples fRom Image-Caption datasetS) is a suite of test-only benchmarks for visual question answering…☆13Feb 18, 2023Updated 3 years ago
- Code for paper: "Privately generating tabular data using language models".☆15Jun 13, 2023Updated 2 years ago
- MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)☆326Jan 20, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Some papers about *diverse* image (a few videos) captioning☆26Apr 4, 2023Updated 3 years ago
- ☆17Aug 11, 2023Updated 2 years ago
- "Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs" 2023☆16Nov 28, 2024Updated last year
- [CVPR 2024] Do you remember? Dense Video Captioning with Cross-Modal Memory Retrieval☆66Jun 19, 2024Updated last year
- Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal Models for Video Question Answering [ACM MM'24]☆10Jul 22, 2024Updated last year
- Code for Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models☆92Jun 28, 2024Updated last year
- Question-Aware Gaussian Experts for Audio-Visual Question Answering -- Official Pytorch Implementation (CVPR'25, Highlight)☆29Jun 6, 2025Updated 10 months ago
- ☆30Jul 25, 2025Updated 9 months ago
- ☆38Jul 24, 2023Updated 2 years ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- 🔥 [NeurIPS 2025] Official implementation of "Generate, but Verify: Reducing Visual Hallucination in Vision-Language Models with Retrospe…☆57Jan 22, 2026Updated 3 months ago
- ☆28Jul 18, 2025Updated 9 months ago
- ☆15Jan 9, 2026Updated 3 months ago
- ☆17Sep 23, 2024Updated last year
- Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types☆32Jul 16, 2025Updated 9 months ago
- ☆22Apr 27, 2024Updated 2 years ago
- The repo for "On-the-fly Modulation for Balanced Multimodal Learning", T-PAMI 2024☆19Sep 29, 2024Updated last year
- An implementation of the Holistic Pursuit for the Multi-Layer Sparse Coding model. Contains a comparison to the projection pursuit algori…☆19Dec 19, 2018Updated 7 years ago
- Experiments for "A Closer Look at In-Context Learning under Distribution Shifts"☆19May 29, 2023Updated 2 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- LongMIL for WSI analysis with extrapolation positional embedding☆14Dec 11, 2024Updated last year
- [CVPR 2025] COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training☆41Mar 27, 2025Updated last year
- ☆21Mar 12, 2025Updated last year
- ☆11May 24, 2024Updated last year
- ☆23Apr 19, 2024Updated 2 years ago
- ☆23Sep 28, 2023Updated 2 years ago
- Official Repository for CVPR 2022 paper "REX: Reasoning-aware and Grounded Explanation"☆22Nov 21, 2023Updated 2 years ago