chili-lab / SPORTULinks
[ICLR2025] SPORTU: A Comprehensive Sports Understanding Benchmark for Multimodal Large Language Models
☆14Updated 4 months ago
Alternatives and similar repositories for SPORTU
Users that are interested in SPORTU are comparing it to the libraries listed below
Sorting:
- Language Repository for Long Video Understanding☆31Updated last year
- [ICLR 2025] Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision☆66Updated last year
- [ICCV 2025] Official Repository of VideoLLaMB: Long Video Understanding with Recurrent Memory Bridges☆71Updated 4 months ago
- [NeurIPS 2024 D&B] VideoGUI: A Benchmark for GUI Automation from Instructional Videos☆40Updated last month
- [ECCV 2024] STEVE in Minecraft is for See and Think: Embodied Agent in Virtual Environment☆39Updated last year
- Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models☆40Updated last week
- ☆61Updated 4 months ago
- ☆50Updated 3 weeks ago
- ☆57Updated 3 months ago
- [EMNLP 2024] A Video Chat Agent with Temporal Prior☆31Updated 4 months ago
- Official implementation of Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning☆16Updated 8 months ago
- Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (arXiv 2025)☆102Updated last week
- ☆63Updated this week
- ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration☆46Updated 6 months ago
- Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"☆28Updated this week
- official code for "BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning"☆36Updated 5 months ago
- AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time☆73Updated last month
- Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model☆67Updated 6 months ago
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆46Updated 4 months ago
- Github repository for "Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas" (ICML 2025)☆35Updated 2 months ago
- [NeurIPS 2024] Official Implementation for Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks☆78Updated last month
- [IROS'25 Oral & NeurIPSw'24] Official implementation of "MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simula…☆91Updated last month
- [NeurIPS 2024] A task generation and model evaluation system for multimodal language models.☆71Updated 7 months ago
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement☆93Updated last week
- ☆87Updated 3 weeks ago
- [ICCV 2025] Auto Interpretation Pipeline and many other functionalities for Multimodal SAE Analysis.☆145Updated last week
- This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR2025]☆71Updated 2 weeks ago
- ☆107Updated 3 months ago
- ☆36Updated last year
- MuMA-ToM: Multi-modal Multi-Agent Theory of Mind☆30Updated 5 months ago