Video Question Answering | Video QA | VQA
☆95Nov 17, 2025Updated 5 months ago
Alternatives and similar repositories for Video-Question-Answering_Resources
Users that are interested in Video-Question-Answering_Resources are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal Models for Video Question Answering [ACM MM'24]☆10Jul 22, 2024Updated last year
- ACM Multimedia 2023 (Oral) - RTQ: Rethinking Video-language Understanding Based on Image-text Model☆16Apr 7, 2026Updated 3 weeks ago
- [EMNLP’24 Main] Encoding and Controlling Global Semantics for Long-form Video Question Answering☆18Oct 9, 2024Updated last year
- This repository contains code for AAAI2025 paper "Dense Audio-Visual Event Localization under Cross-Modal Consistency and Multi-Temporal …☆24Aug 18, 2025Updated 8 months ago
- ☆17Aug 11, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- (NeurIPS 2024 Spotlight) TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignment☆29Sep 27, 2024Updated last year
- Question-Aware Gaussian Experts for Audio-Visual Question Answering -- Official Pytorch Implementation (CVPR'25, Highlight)☆29Jun 6, 2025Updated 10 months ago
- [EMNLP 2024] A Video Chat Agent with Temporal Prior☆33Mar 2, 2025Updated last year
- [Main Conference @ EACL'26] [Workshop @ NeurIPS'24] 🎞️ LVNet.☆43Feb 10, 2026Updated 2 months ago
- Synthetic Event-based Vision Dataset for Ego and Fixed Traffic Perception☆18May 7, 2024Updated last year
- [CVPR 2025] Official PyTorch code of "Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation".☆55May 25, 2025Updated 11 months ago
- [ACM MM2024] The code for HMLLM.☆11Oct 27, 2024Updated last year
- Repository of PIXAR, a Pixel-based Auto-Regressive Language Model☆18Sep 15, 2025Updated 7 months ago
- ☆12Dec 15, 2023Updated 2 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Software Engineering Back End Microservices Project☆15Nov 20, 2024Updated last year
- ☆16Nov 11, 2025Updated 5 months ago
- TTRV: Test-Time Reinforcement Learning for Vision–Language Models (CVPR 2026)☆39Mar 8, 2026Updated last month
- [ACL 2025 Findings] Official pytorch implementation of "Don't Miss the Forest for the Trees: Attentional Vision Calibration for Large Vis…☆25Jul 21, 2024Updated last year
- Code for EMNLP'24 paper - On Diversified Preferences of Large Language Model Alignment☆16Aug 6, 2024Updated last year
- Implementation of the DocLLM paper for Llama models.☆13Apr 6, 2025Updated last year
- [CVPR 2021] SUTD-TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events☆66Feb 9, 2026Updated 2 months ago
- Implementation of the paper: "BRAVE : Broadening the visual encoding of vision-language models"☆26Apr 13, 2026Updated 3 weeks ago
- AgenTracer: A Lightweight Failure Attributor for Agentic Systems☆89Nov 12, 2025Updated 5 months ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- ☆33Feb 12, 2026Updated 2 months ago
- Adapting Precomputed Features for Efficient Graph Condensation (ICML 2025)☆48Jun 27, 2025Updated 10 months ago
- VideoDetective: Clue Hunting via both Extrinsic Query and Intrinsic Relevance for Long Video Understanding☆58Mar 24, 2026Updated last month
- [ICCV 2025] Official Implementation of "Shot-by-Shot: Film-Grammar-Aware Training-Free Audio Description Generation". Junyu Xie, Tengda H…☆22Jul 26, 2025Updated 9 months ago
- Github Repo for ICML 2022 paper: Communication-Efficient Adaptive Federated Learning☆10Nov 18, 2022Updated 3 years ago
- [ICCV 2025] Factorized Learning for Temporally Grounded Video-Language Models☆24Apr 18, 2026Updated 2 weeks ago
- Archive of Tasks and Results of the Video Browser Showdown☆13Feb 2, 2026Updated 3 months ago
- Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)☆87Jul 1, 2024Updated last year
- [NeurIPS'24 spotlight] MECD: Unlocking Multi-Event Causal Discovery in Video Reasoning. [TPAMI'25] MECD+☆47Feb 11, 2026Updated 2 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆14Oct 6, 2024Updated last year
- ☆12Nov 3, 2023Updated 2 years ago
- Code for Neural Networks journal paper - StoCFL: A stochastically clustered federated learning framework for Non-IID data with dynamic cl…☆13Apr 28, 2024Updated 2 years ago
- ☆17May 18, 2024Updated last year
- The first comprehensive multimodal language analysis benchmark for evaluating foundation models☆31Sep 22, 2025Updated 7 months ago
- Code implementation of the paper "World-in-World: World Models in a Closed-Loop World" (ICLR'26 Oral)☆162Apr 3, 2026Updated last month
- [ICCVW2025] V-RoAst: Visual Road Assessment. Can VLM be a Road Safety Assessor Using the iRAP Standard?☆11Dec 17, 2025Updated 4 months ago