Video Question Answering | Video QA | VQA
☆93Nov 17, 2025Updated 4 months ago
Alternatives and similar repositories for Video-Question-Answering_Resources
Users that are interested in Video-Question-Answering_Resources are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆13Feb 26, 2024Updated 2 years ago
- Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal Models for Video Question Answering [ACM MM'24]☆10Jul 22, 2024Updated last year
- ACM Multimedia 2023 (Oral) - RTQ: Rethinking Video-language Understanding Based on Image-text Model☆16Jan 31, 2024Updated 2 years ago
- This repository contains code for AAAI2025 paper "Dense Audio-Visual Event Localization under Cross-Modal Consistency and Multi-Temporal …☆24Aug 18, 2025Updated 7 months ago
- LMM for VQA, tcsvt version☆10Jul 19, 2024Updated last year
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ☆17Aug 11, 2023Updated 2 years ago
- Resources Related to Event-based Vision | Event Cameras | DVS☆299May 30, 2025Updated 10 months ago
- (NeurIPS 2024 Spotlight) TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignment☆29Sep 27, 2024Updated last year
- Question-Aware Gaussian Experts for Audio-Visual Question Answering -- Official Pytorch Implementation (CVPR'25, Highlight)☆28Jun 6, 2025Updated 10 months ago
- [EMNLP 2024] A Video Chat Agent with Temporal Prior☆32Mar 2, 2025Updated last year
- [Main Conference @ EACL'26] [Workshop @ NeurIPS'24] 🎞️ LVNet.☆43Feb 10, 2026Updated 2 months ago
- [CVPR 2025] Official PyTorch code of "Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation".☆55May 25, 2025Updated 10 months ago
- Minute-long video generation at 24FPS.☆61Mar 28, 2026Updated 2 weeks ago
- [ACM MM2024] The code for HMLLM.☆11Oct 27, 2024Updated last year
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- ☆12Dec 15, 2023Updated 2 years ago
- Koishi's Day 2024 Paper (NeurIPS 2024): An advanced persona-driven role-playing system with global faithfulness quantification and optimi…☆11Oct 19, 2025Updated 5 months ago
- [IEEE T-PAMI 2023] Cross-Modal Causal Relational Reasoning for Event-Level Visual Question Answering☆20Jul 6, 2023Updated 2 years ago
- ☆16Nov 11, 2025Updated 5 months ago
- Code for EMNLP'24 paper - On Diversified Preferences of Large Language Model Alignment☆16Aug 6, 2024Updated last year
- 中国历年GDP和人口数据可视化☆13Jan 18, 2023Updated 3 years ago
- [CVPR 2026] UFVideo: Towards Unified Fine-Grained Video Cooperative Understanding with Large Language Models☆37Feb 21, 2026Updated last month
- AgenTracer: A Lightweight Failure Attributor for Agentic Systems☆86Nov 12, 2025Updated 5 months ago
- Implementation of the paper: "BRAVE : Broadening the visual encoding of vision-language models"☆26Mar 20, 2026Updated 3 weeks ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- ☆29Feb 12, 2026Updated 2 months ago
- "FusionFactory: Fusing LLM Capabilities with Routing Data", Tao Feng, Haozhen Zhang, Zijie Lei, Pengrui Han, Mostofa Patwary, Mohammad Sh…☆20Dec 30, 2025Updated 3 months ago
- VideoDetective: Clue Hunting via both Extrinsic Query and Intrinsic Relevance for Long Video Understanding☆59Mar 24, 2026Updated 2 weeks ago
- [ICCV 2025] Factorized Learning for Temporally Grounded Video-Language Models☆24Jan 1, 2026Updated 3 months ago
- [ICCV 2025] Official Implementation of "Shot-by-Shot: Film-Grammar-Aware Training-Free Audio Description Generation". Junyu Xie, Tengda H…☆22Jul 26, 2025Updated 8 months ago
- Github Repo for ICML 2022 paper: Communication-Efficient Adaptive Federated Learning☆10Nov 18, 2022Updated 3 years ago
- The official PyTorch code for "Traffic Scene Parsing through the TSP6K Dataset".☆34Jul 6, 2025Updated 9 months ago
- Archive of Tasks and Results of the Video Browser Showdown☆13Feb 2, 2026Updated 2 months ago
- Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)☆86Jul 1, 2024Updated last year
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ☆12Nov 3, 2023Updated 2 years ago
- A4-Agent: An Agentic Framework for Zero-Shot Affordance Reasoning☆37Mar 12, 2026Updated last month
- The implementation for the work "Unconstrained Monotonic Calibration of Predictions in Deep Ranking Systems".☆22Jun 11, 2025Updated 10 months ago
- ☆17May 18, 2024Updated last year
- Code implementation of the paper "World-in-World: World Models in a Closed-Loop World" (ICLR'26 Oral)☆153Apr 3, 2026Updated last week
- [ICCVW2025] V-RoAst: A New Dataset for Visual Road Assessment☆11Dec 17, 2025Updated 3 months ago
- Contrastive Video Question Answering via Video Graph Transformer (IEEE T-PAMI'23)☆19Mar 9, 2024Updated 2 years ago