☆12Dec 15, 2023Updated 2 years ago
Alternatives and similar repositories for TranSTR
Users that are interested in TranSTR are comparing it to the libraries listed below
Sorting:
- Contrastive Video Question Answering via Video Graph Transformer (IEEE T-PAMI'23)☆19Mar 9, 2024Updated last year
- ☆36Dec 20, 2023Updated 2 years ago
- ☆101Oct 19, 2022Updated 3 years ago
- [EMNLP’24 Main] Encoding and Controlling Global Semantics for Long-form Video Question Answering☆18Oct 9, 2024Updated last year
- Official repository of "Prompting Segmentation with Sound is Generalizable Audio-Visual Source Localizer", AAAI 2024☆27Mar 26, 2024Updated last year
- [NeurIPS 2023] Self-Chained Image-Language Model for Video Localization and Question Answering☆196Jan 14, 2024Updated 2 years ago
- Video Graph Transformer for Video Question Answering (ECCV'22)☆49Jun 8, 2023Updated 2 years ago
- Learning Situation Hyper-Graphs for Video Question Answering☆22Feb 16, 2024Updated 2 years ago
- This repo contains source code for Glance and Focus: Memory Prompting for Multi-Event Video Question Answering (Accepted in NeurIPS 2023)☆31Jun 28, 2024Updated last year
- Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)☆83Jul 1, 2024Updated last year
- ☆37Sep 16, 2024Updated last year
- [ICCV2023] Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer☆37Oct 18, 2023Updated 2 years ago
- The official repo for "Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes", ECCV 2024☆50Oct 12, 2025Updated 4 months ago
- VQACL: A Novel Visual Question Answering Continual Learning Setting (CVPR'23)☆44Mar 28, 2024Updated last year
- Pytorch implementation of Count-ception and custom CNN counting models for Kaggle Sea Lion Count challenge☆10Jun 30, 2017Updated 8 years ago
- An experiment with movie scenes and contrastive learning☆11Feb 1, 2025Updated last year
- Placeholder☆10Jul 17, 2023Updated 2 years ago
- Official PyTorch implementation of the ECCV 2022 paper: Efficient Video Transformers with Spatial-Temporal Token Selection.☆51Jul 13, 2022Updated 3 years ago
- Official repo for CVPR 2022 (Oral) paper: Revisiting the "Video" in Video-Language Understanding. Contains code for the Atemporal Probe (…☆51May 29, 2024Updated last year
- The official implement of "Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings"☆18Dec 5, 2024Updated last year
- The official GitHub repository for AC-EVAL, an ancient Chinese evaluation suite for large language models (LLMs)☆16Nov 12, 2024Updated last year
- Official implementation of CVPR 2024 paper "vid-TLDR: Training Free Token merging for Light-weight Video Transformer".☆55Oct 21, 2025Updated 4 months ago
- ☆14Jul 27, 2021Updated 4 years ago
- Official Implementation for CVPR 2023 paper "Divide and Conquer: Answering Questions with Object Factorization and Compositional Reasonin…☆10Jun 16, 2024Updated last year
- Official Implementation for "SiLVR : A Simple Language-based Video Reasoning Framework"☆19Jan 18, 2026Updated last month
- ☆28Jul 24, 2025Updated 7 months ago
- 中国历年GDP和人口数据可视化☆13Jan 18, 2023Updated 3 years ago
- 李宏毅机器学习2021笔记☆14Nov 27, 2022Updated 3 years ago
- ☆22Jun 5, 2025Updated 9 months ago
- ☆13Feb 26, 2024Updated 2 years ago
- [ICLR 2025] CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion☆56Jul 1, 2025Updated 8 months ago
- ☆13Jul 23, 2024Updated last year
- A comprehensive overview of neural question generation across diverse input formats.☆58Oct 31, 2025Updated 4 months ago
- Differentiable Clustering with Perturbed Random Forests, NeurIPS2023☆13Oct 16, 2023Updated 2 years ago
- [EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding☆49Jan 9, 2024Updated 2 years ago
- Text Proxy: Decomposing Retrieval from a 1-to-N Relationship into N 1-to-1 Relationships for Text-Video Retrieval -- AAAI2025☆17Jul 14, 2025Updated 7 months ago
- Adapt MLLMs to Domains via Post-Training (EMNLP 2025 Findings)☆13Nov 11, 2025Updated 3 months ago
- GoogleNet☆12Mar 24, 2018Updated 7 years ago
- (ICCV 2025) OmniSAM: Omnidirectional Segment Anything Model for UDA in Panoramic Semantic Segmentation☆14Oct 11, 2025Updated 4 months ago