chili-lab / SPORTULinks
[ICLR 2025] SPORTU: A Comprehensive Sports Understanding Benchmark for Multimodal Large Language Models
☆16Updated 3 months ago
Alternatives and similar repositories for SPORTU
Users that are interested in SPORTU are comparing it to the libraries listed below
Sorting:
- [ICLR 2025] Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision☆72Updated last year
- Code for our ACL 2025 paper "Language Repository for Long Video Understanding"☆33Updated last year
- [ICCV 2025] Official Repository of VideoLLaMB: Long Video Understanding with Recurrent Memory Bridges☆79Updated 9 months ago
- [NeurIPS 2024 D&B] VideoGUI: A Benchmark for GUI Automation from Instructional Videos☆48Updated 6 months ago
- ☆62Updated 3 months ago
- [ECCV 2024] STEVE in Minecraft is for See and Think: Embodied Agent in Virtual Environment☆39Updated last year
- [NeurIPS 2025] Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models☆51Updated 2 months ago
- [EMNLP 2024] A Video Chat Agent with Temporal Prior☆33Updated 9 months ago
- DeepPHY: Benchmarking Agentic VLMs on Physical Reasoning☆164Updated last month
- TStar is a unified temporal search framework for long-form video question answering☆76Updated 3 months ago
- [EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding☆49Updated last year
- ☆37Updated last year
- [NeurIPS 2024] A task generation and model evaluation system for multimodal language models.☆73Updated last year
- [ACL2025 Findings] Benchmarking Multihop Multimodal Internet Agents☆47Updated 9 months ago
- [ECCV2024] Official code implementation of Merlin: Empowering Multimodal LLMs with Foresight Minds☆96Updated last year
- ☆95Updated 5 months ago
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement☆123Updated 4 months ago
- [NeurIPS 2024] Official Implementation for Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks☆89Updated 6 months ago
- ☆68Updated 3 months ago
- Code for LifelongMemory: Leveraging LLMs for Answering Queries in Long-form Egocentric Videos☆27Updated last month
- [ICML 2025] VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models☆37Updated 6 months ago
- [EMNLP 2025 Main] AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time☆88Updated 6 months ago
- [ICLR 2023] CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding☆46Updated 6 months ago
- Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model☆79Updated 3 weeks ago
- 🤖 [ICLR'25] Multimodal Video Understanding Framework (MVU)☆51Updated 10 months ago
- Github repository for "Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas" (ICML 2025)☆65Updated 7 months ago
- Code and Dataset for the CVPRW Paper "Where did I leave my keys? — Episodic-Memory-Based Question Answering on Egocentric Videos"☆29Updated 2 years ago
- Official implementation of paper VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video Comprehension with Video-Text Duet Interact…☆36Updated 10 months ago
- [NeurIPS2023] Official implementation of the paper "Large Language Models are Visual Reasoning Coordinators"☆104Updated 2 years ago
- Repo for paper "T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs"☆48Updated 3 months ago