πΎ E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding (NeurIPS 2024)
β74Jan 20, 2025Updated last year
Alternatives and similar repositories for ETBench
Users that are interested in ETBench are comparing it to the libraries listed below
Sorting:
- [ECCV 2024] Learning Video Context as Interleaved Multimodal Sequencesβ43Mar 11, 2025Updated last year
- [CVPR 2025 Oral] VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selectionβ137Jul 28, 2025Updated 7 months ago
- β32Jul 29, 2024Updated last year
- [NeurIPS 2024] Mitigating Object Hallucination via Concentric Causal Attentionβ66Aug 30, 2025Updated 6 months ago
- β80Nov 24, 2024Updated last year
- [CVPR 2026] TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMsβ114Mar 12, 2026Updated last week
- [CVPR 2025] LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understandingβ83Jul 4, 2025Updated 8 months ago
- π§ VideoMind: A Chain-of-LoRA Agent for Temporal-Grounded Video Reasoning (ICLR 2026)β311Feb 8, 2026Updated last month
- DisTime: Distribution-based Time Representation for Video Large Language Models.β19Jul 10, 2025Updated 8 months ago
- Data release for Step Differences in Instructional Video (CVPR24)β14Jun 19, 2024Updated last year
- [CVPR'2024 Highlight] Official PyTorch implementation of the paper "VTimeLLM: Empower LLM to Grasp Video Moments".β296Jun 13, 2024Updated last year
- TEMPURA enables video-language models to reason about causal event relationships and generate fine-grained, timestamped descriptions of uβ¦β25Jun 4, 2025Updated 9 months ago
- β13Aug 7, 2025Updated 7 months ago
- [NeurIPS 2023] Rewrite Caption Semantics: Bridging Semantic Gaps for Language-Supervised Semantic Segmentationβ20Jan 3, 2024Updated 2 years ago
- UniMD: Towards Unifying Moment retrieval and temporal action Detectionβ57Jul 5, 2024Updated last year
- β18Jul 10, 2024Updated last year
- β31Nov 17, 2024Updated last year
- A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Abilityβ106Nov 28, 2024Updated last year
- [ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, β¦β129Apr 4, 2025Updated 11 months ago
- (CVPR 2026) Long-RVOS: A Comprehensive Benchmark for Long-term Referring Video Object Segmentationβ28Feb 28, 2026Updated 2 weeks ago
- Official implementation of paper VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video Comprehension with Video-Text Duet Interactβ¦β43Feb 5, 2025Updated last year
- [ICCV 2025] Dynamic-VLMβ28Dec 16, 2024Updated last year
- The official code of Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval (AAAI2024)β32Mar 29, 2024Updated last year
- [EMNLP 2025 Findings] Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Modelsβ140Aug 21, 2025Updated 6 months ago
- Official Implementation of "Chrono: A Simple Blueprint for Representing Time in MLLMs"β92Mar 9, 2025Updated last year
- β14Oct 30, 2023Updated 2 years ago
- [EMNLP 2025 Main] The official repo of MMLU-ProX benchmark.β27Aug 26, 2025Updated 6 months ago
- Official PyTorch code of GroundVQA (CVPR'24)β64Sep 13, 2024Updated last year
- Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understandingβ293Aug 5, 2025Updated 7 months ago
- β18Jan 26, 2026Updated last month
- [Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.β114Jul 27, 2024Updated last year
- [AAAI 2025] VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Groundingβ126Dec 10, 2024Updated last year
- β61Feb 27, 2026Updated 3 weeks ago
- TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Modelsβ38Nov 10, 2024Updated last year
- [ICML 2025] Official PyTorch implementation of LongVUβ424May 8, 2025Updated 10 months ago
- Code for the paper: "Sentence Specified Dynamic Video Thumbnail Generation"β33Aug 8, 2019Updated 6 years ago
- β28Apr 8, 2025Updated 11 months ago
- Implementation Code for paper "Efficient Multimodal Fusion via Interactive Prompting" in CVPR2023β17Jul 24, 2023Updated 2 years ago
- R1-like Video-LLM for Temporal Groundingβ135Jun 20, 2025Updated 9 months ago