chili-lab / SPORTULinks
[ICLR 2025] SPORTU: A Comprehensive Sports Understanding Benchmark for Multimodal Large Language Models
☆16Updated 3 months ago
Alternatives and similar repositories for SPORTU
Users that are interested in SPORTU are comparing it to the libraries listed below
Sorting:
- Code for our ACL 2025 paper "Language Repository for Long Video Understanding"☆33Updated last year
- [ICLR 2025] Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision☆72Updated last year
- [ICCV 2025] Official Repository of VideoLLaMB: Long Video Understanding with Recurrent Memory Bridges☆80Updated 10 months ago
- [NeurIPS 2024 D&B] VideoGUI: A Benchmark for GUI Automation from Instructional Videos☆48Updated 6 months ago
- [EMNLP 2024] A Video Chat Agent with Temporal Prior☆33Updated 10 months ago
- [NeurIPS 2025] Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models☆52Updated 3 months ago
- ☆62Updated 4 months ago
- TStar is a unified temporal search framework for long-form video question answering☆82Updated 4 months ago
- ☆128Updated 8 months ago
- Github repository for "Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas" (ICML 2025)☆67Updated 8 months ago
- PhysGame Benchmark for Physical Commonsense Evaluation in Gameplay Videos☆46Updated 6 months ago
- ☆68Updated 3 months ago
- Code for LifelongMemory: Leveraging LLMs for Answering Queries in Long-form Egocentric Videos☆27Updated 2 months ago
- [ACM Multimedia 2025] "Multi-Agent System for Comprehensive Soccer Understanding"☆61Updated 2 months ago
- Code repo for "Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding"☆28Updated last year
- DeepPHY: Benchmarking Agentic VLMs on Physical Reasoning☆167Updated last month
- ☆42Updated 6 months ago
- Multimodal RewardBench☆58Updated 10 months ago
- ☆112Updated 5 months ago
- ☆40Updated 3 months ago
- ☆37Updated last year
- (ICCV2025) Official repository of paper "ViSpeak: Visual Instruction Feedback in Streaming Videos"☆44Updated 6 months ago
- ☆96Updated 6 months ago
- ☆80Updated 6 months ago
- This repo contains evaluation code for the paper "AV-Odyssey: Can Your Multimodal LLMs Really Understand Audio-Visual Information?"☆31Updated last year
- Code and Dataset for the CVPRW Paper "Where did I leave my keys? — Episodic-Memory-Based Question Answering on Egocentric Videos"☆29Updated 2 years ago
- Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model☆80Updated last month
- Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks☆33Updated last month
- Quick Long Video Understanding☆72Updated 2 months ago
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement☆126Updated 5 months ago