☆12Dec 15, 2023Updated 2 years ago
Alternatives and similar repositories for TranSTR
Users that are interested in TranSTR are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Contrastive Video Question Answering via Video Graph Transformer (IEEE T-PAMI'23)☆19Mar 9, 2024Updated 2 years ago
- ☆36Dec 20, 2023Updated 2 years ago
- [EMNLP’24 Main] Encoding and Controlling Global Semantics for Long-form Video Question Answering☆18Oct 9, 2024Updated last year
- [NeurIPS 2023] Self-Chained Image-Language Model for Video Localization and Question Answering☆198Jan 14, 2024Updated 2 years ago
- Video Graph Transformer for Video Question Answering (ECCV'22)☆49Jun 8, 2023Updated 2 years ago
- ☆101Oct 19, 2022Updated 3 years ago
- Official repository of "Prompting Segmentation with Sound is Generalizable Audio-Visual Source Localizer", AAAI 2024☆27Mar 14, 2026Updated last week
- This repo contains source code for Glance and Focus: Memory Prompting for Multi-Event Video Question Answering (Accepted in NeurIPS 2023)☆31Jun 28, 2024Updated last year
- ☆13Feb 26, 2024Updated 2 years ago
- Learning Situation Hyper-Graphs for Video Question Answering☆22Feb 16, 2024Updated 2 years ago
- Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)☆84Jul 1, 2024Updated last year
- An experiment with movie scenes and contrastive learning☆11Feb 1, 2025Updated last year
- The official repo for "Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes", ECCV 2024☆50Oct 12, 2025Updated 5 months ago
- ☆13Aug 14, 2022Updated 3 years ago
- Text Proxy: Decomposing Retrieval from a 1-to-N Relationship into N 1-to-1 Relationships for Text-Video Retrieval -- AAAI2025☆17Jul 14, 2025Updated 8 months ago
- ☆37Sep 16, 2024Updated last year
- 李宏毅机器学习2021笔记☆14Nov 27, 2022Updated 3 years ago
- 中国历年GDP和人口数据可视化☆13Jan 18, 2023Updated 3 years ago
- Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal Models for Video Question Answering [ACM MM'24]☆10Jul 22, 2024Updated last year
- The official GitHub repository for AC-EVAL, an ancient Chinese evaluation suite for large language models (LLMs)☆16Nov 12, 2024Updated last year
- ☆66Feb 1, 2026Updated last month
- Official repo for CVPR 2022 (Oral) paper: Revisiting the "Video" in Video-Language Understanding. Contains code for the Atemporal Probe (…☆51May 29, 2024Updated last year
- Official PyTorch implementation of the ECCV 2022 paper: Efficient Video Transformers with Spatial-Temporal Token Selection.☆52Jul 13, 2022Updated 3 years ago
- [NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models☆158Dec 9, 2024Updated last year
- [EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding☆49Jan 9, 2024Updated 2 years ago
- [ACL 2023] Official PyTorch code for Singularity model in "Revealing Single Frame Bias for Video-and-Language Learning"☆136May 5, 2023Updated 2 years ago
- [ICCV2023] Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer☆37Oct 18, 2023Updated 2 years ago
- ☆138Sep 29, 2024Updated last year
- ☆24Apr 4, 2022Updated 3 years ago
- Code for the paper: Studying How to Efficiently and Effectively Guide Models with Explanations. ICCV 2023.☆19Nov 1, 2023Updated 2 years ago
- [ICLR 2025] CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion☆56Jul 1, 2025Updated 8 months ago
- Pytorch implementation of Count-ception and custom CNN counting models for Kaggle Sea Lion Count challenge☆10Jun 30, 2017Updated 8 years ago
- Implementation Code for paper "Efficient Multimodal Fusion via Interactive Prompting" in CVPR2023☆17Jul 24, 2023Updated 2 years ago
- This repository contains code for AAAI2025 paper "Dense Audio-Visual Event Localization under Cross-Modal Consistency and Multi-Temporal …☆23Aug 18, 2025Updated 7 months ago
- ☆24Sep 24, 2023Updated 2 years ago
- A Data Visualization project on the French traffic accidents database☆19Aug 27, 2019Updated 6 years ago
- [ECCV 2022] GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval☆17Aug 24, 2022Updated 3 years ago
- Official implementation of CVPR 2024 paper "vid-TLDR: Training Free Token merging for Light-weight Video Transformer".☆55Oct 21, 2025Updated 5 months ago
- ☆17May 31, 2023Updated 2 years ago