chili-lab / SPORTULinks
[ICLR2025] SPORTU: A Comprehensive Sports Understanding Benchmark for Multimodal Large Language Models
☆14Updated 5 months ago
Alternatives and similar repositories for SPORTU
Users that are interested in SPORTU are comparing it to the libraries listed below
Sorting:
- Language Repository for Long Video Understanding☆32Updated last year
- [ICLR 2025] Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision☆66Updated last year
- [ICCV 2025] Official Repository of VideoLLaMB: Long Video Understanding with Recurrent Memory Bridges☆71Updated 5 months ago
- ☆52Updated this week
- Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models☆40Updated 3 weeks ago
- [EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding☆51Updated last year
- ☆87Updated last month
- DeepPHY: Benchmarking Agentic VLMs on Physical Reasoning☆30Updated this week
- [ICML 2025] VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models☆28Updated last month
- Repo for paper "T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs"☆49Updated 5 months ago
- [NeurIPS 2024 D&B] VideoGUI: A Benchmark for GUI Automation from Instructional Videos☆41Updated last month
- ☆61Updated 5 months ago
- ☆138Updated 10 months ago
- The official repo for "VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search"☆26Updated 3 months ago
- Official implementation of Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning☆16Updated 9 months ago
- ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration☆47Updated 7 months ago
- ☆107Updated 3 months ago
- [NeurIPS 2024] A task generation and model evaluation system for multimodal language models.☆72Updated 8 months ago
- Official code for Paper "Mantis: Multi-Image Instruction Tuning" [TMLR 2024]☆223Updated 4 months ago
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement☆104Updated 2 weeks ago
- Official implementation for "A Simple LLM Framework for Long-Range Video Question-Answering"☆100Updated 9 months ago
- TStar is a unified temporal search framework for long-form video question answering☆59Updated 4 months ago
- [ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences☆40Updated 5 months ago
- This repo contains evaluation code for the paper "AV-Odyssey: Can Your Multimodal LLMs Really Understand Audio-Visual Information?"☆26Updated 7 months ago
- [EMNLP 2024] A Video Chat Agent with Temporal Prior☆31Updated 5 months ago
- ☆72Updated 2 weeks ago
- ☆17Updated 3 months ago
- official code for "BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning"☆36Updated 6 months ago
- A Comprehensive Benchmark for Robust Multi-image Understanding☆12Updated 11 months ago
- Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models☆118Updated 4 months ago