mlvlab / Flipped-VQAView external linksLinks
Large Language Models are Temporal and Causal Reasoners for Video Question Answering (EMNLP 2023)
☆77Mar 26, 2025Updated 10 months ago
Alternatives and similar repositories for Flipped-VQA
Users that are interested in Flipped-VQA are comparing it to the libraries listed below
Sorting:
- Open-Vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models (ICCV 20…☆18Apr 23, 2024Updated last year
- Official Implementation (Pytorch) of the "VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Capti…☆23Jan 26, 2025Updated last year
- Archive for AI grand challenge☆20Jun 6, 2023Updated 2 years ago
- ☆16Jun 5, 2023Updated 2 years ago
- MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models (CVPR 2023)☆35Apr 23, 2024Updated last year
- ☆16Jun 5, 2023Updated 2 years ago
- Official Implementation (Pytorch) of the "Generative Subgraph Retrieval for Knowledge Graph-Grounded Dialog Generation", EMNLP 2024 (main…☆12Mar 10, 2025Updated 11 months ago
- [NeurIPS25] Official Implementation (Pytorch) of "DeepVideo-R1"☆31Nov 15, 2025Updated 3 months ago
- 2021 Drone AI challenge☆16Jan 4, 2022Updated 4 years ago
- Official PyTorch Implementation for Advancing Bayesian Optimization via Learning Correlated Latent Space (CoBO)☆18Apr 22, 2025Updated 9 months ago
- Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)☆83Jul 1, 2024Updated last year
- Bidirectional Likelihood Estimation with Multi-Modal Large Language Models for Text-Video Retrieval (ICCV 2025 Highlight)☆20Aug 1, 2025Updated 6 months ago
- ☆10Apr 19, 2024Updated last year
- Official Implementation (Pytorch) of "Super-class guided Transformer for Zero-Shot Attribute Classification", AAAI 2025☆15Jan 15, 2025Updated last year
- Video-Text Representation Learning via Differentiable Weak Temporal Alignment (CVPR 2022)☆17Apr 19, 2024Updated last year
- Learning Situation Hyper-Graphs for Video Question Answering☆22Feb 16, 2024Updated last year
- [NeurIPS 2023] Self-Chained Image-Language Model for Video Localization and Question Answering☆195Jan 14, 2024Updated 2 years ago
- Official Implementation (Pytorch) of "DAVI: Diffusion Prior-Based Amortized Variational Inference for Noisy Inverse Problems", ECCV 2024 …☆74Aug 16, 2024Updated last year
- [NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models☆158Dec 9, 2024Updated last year
- This repo contains source code for Glance and Focus: Memory Prompting for Multi-Event Video Question Answering (Accepted in NeurIPS 2023)☆31Jun 28, 2024Updated last year
- Official implementation of CVPR 2024 paper "Prompt Learning via Meta-Regularization".☆32Mar 10, 2025Updated 11 months ago
- Official Implementation (Pytorch) of the "LLaMo: Large Language Model-based Molecular Graph Assistant", NeurIPS 2024☆33Feb 12, 2025Updated last year
- Basic Artificial Intelligence Theory☆10Mar 11, 2025Updated 11 months ago
- Official implementation of CVPR 2024 paper "vid-TLDR: Training Free Token merging for Light-weight Video Transformer".☆54Oct 21, 2025Updated 3 months ago
- [ACL 2025 Findings] Official pytorch implementation of "Don't Miss the Forest for the Trees: Attentional Vision Calibration for Large Vis…☆24Jul 21, 2024Updated last year
- ☆80Nov 24, 2024Updated last year
- Belief Revision based Caption Re-ranker with Visual Semantic Information. COLING 2022☆11Apr 13, 2025Updated 10 months ago
- Official pytorch implementation of 'Relation-aware Language-Graph Transformer for Question Answering' (AAAI 2023)☆18Apr 25, 2023Updated 2 years ago
- Official PyTorch implementation of "Stochastic Conditional Diffusion Models for Robust Semantic Image Synthesis" (ICML 2024).☆19Nov 20, 2024Updated last year
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆85Oct 26, 2025Updated 3 months ago
- CVPR 2022 (Oral) Pytorch Code for Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment☆22Apr 15, 2022Updated 3 years ago
- Contrastive Video Question Answering via Video Graph Transformer (IEEE T-PAMI'23)☆19Mar 9, 2024Updated last year
- [AAAI 2025] VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding☆125Dec 10, 2024Updated last year
- [ICCV 2021 Oral + TPAMI] Just Ask: Learning to Answer Questions from Millions of Narrated Videos☆126Sep 29, 2023Updated 2 years ago
- [ICML2024] Repo for the paper `Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models'☆22Jan 1, 2025Updated last year
- [ICCV2023] Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer☆37Oct 18, 2023Updated 2 years ago
- Official implementation of CVPR 2024 paper "Multi-criteria Token Fusion with One-step-ahead Attention for Efficient Vision Transformers".☆40Jul 30, 2025Updated 6 months ago
- (2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding☆345Jul 19, 2024Updated last year
- A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability☆105Nov 28, 2024Updated last year