☆72Jul 17, 2024Updated last year
Alternatives and similar repositories for QA-ViT
Users that are interested in QA-ViT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆24Apr 29, 2025Updated last year
- Official repository for "Boosting Audio Visual Question Answering via Key Semantic-Aware Cues" in ACM MM 2024.☆16Oct 25, 2024Updated last year
- FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions☆55Apr 17, 2024Updated 2 years ago
- ☆15Nov 29, 2023Updated 2 years ago
- (ICCV 2025) Enhance CLIP and MLLM's fine-grained visual representations with generative models.☆78Jun 25, 2025Updated 11 months ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Multimodal Semi-Supervised Learning for Text Recognition (SemiMTR)☆83Sep 12, 2023Updated 2 years ago
- ☆15May 10, 2026Updated last month
- Ada-LISTA: Learned Solvers Adaptive to Varying Models☆11Feb 18, 2020Updated 6 years ago
- EMMA [TMLR 2025]☆13Sep 25, 2025Updated 8 months ago
- official repo for paper "[CLS] Token Tells Everything Needed for Training-free Efficient MLLMs"☆22Apr 23, 2025Updated last year
- STVQA and TextVQA OCR results from Amazon Text in Image pipeline☆12Jul 18, 2022Updated 3 years ago
- ☆16Dec 25, 2021Updated 4 years ago
- ☆14May 3, 2022Updated 4 years ago
- ☆32Jul 29, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- TextAdaIN: Paying Attention to Shortcut Learning in Text Recognizers☆21Jul 26, 2022Updated 3 years ago
- code for paper "Accessing higher dimensions for unsupervised word translation"☆22Jun 26, 2023Updated 2 years ago
- MAVERICS (Manually-vAlidated Vq^2a Examples fRom Image-Caption datasetS) is a suite of test-only benchmarks for visual question answering…☆13Feb 18, 2023Updated 3 years ago
- [ICLR 2026] General Policy Composition (GPC)☆41May 2, 2026Updated last month
- Official Repository for "Learning Trimodal Relation for Audio-Visual Question Answering with Missing Modality" (ECCV 2024)☆16Oct 29, 2024Updated last year
- Code for paper: "Privately generating tabular data using language models".☆15Jun 13, 2023Updated 2 years ago
- MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)☆327Jan 20, 2025Updated last year
- ☆22Jun 5, 2025Updated last year
- Some papers about *diverse* image (a few videos) captioning☆25Apr 4, 2023Updated 3 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- "Towards Improving Document Understanding: An Exploration on Text-Grounding via MLLMs" 2023☆16Nov 28, 2024Updated last year
- [CVPR 2024] Do you remember? Dense Video Captioning with Cross-Modal Memory Retrieval☆66Jun 19, 2024Updated last year
- Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal Models for Video Question Answering [ACM MM'24]☆10Jul 22, 2024Updated last year
- Code for 'Robust Federated Learning with Noisy Labels'☆14Jun 28, 2021Updated 4 years ago
- ☆29Jul 25, 2025Updated 10 months ago
- Unified Audio-Visual Perception for Multi-Task Video Localization☆31Apr 19, 2024Updated 2 years ago
- ☆38Jul 24, 2023Updated 2 years ago
- 🔥 [NeurIPS 2025] Official implementation of "Generate, but Verify: Reducing Visual Hallucination in Vision-Language Models with Retrospe…☆57Jan 22, 2026Updated 4 months ago
- ☆28Jul 18, 2025Updated 10 months ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- ☆15Jan 9, 2026Updated 5 months ago
- ☆17Sep 23, 2024Updated last year
- Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types☆32Jul 16, 2025Updated 10 months ago
- ☆22Apr 27, 2024Updated 2 years ago
- The repo for "On-the-fly Modulation for Balanced Multimodal Learning", T-PAMI 2024☆19Sep 29, 2024Updated last year
- An implementation of the Holistic Pursuit for the Multi-Layer Sparse Coding model. Contains a comparison to the projection pursuit algori…☆19Dec 19, 2018Updated 7 years ago
- Experiments for "A Closer Look at In-Context Learning under Distribution Shifts"☆18May 29, 2023Updated 3 years ago