jyrao / SoccerAgentLinks
[ACM Multimedia 2025] "Multi-Agent System for Comprehensive Soccer Understanding"
☆30Updated last week
Alternatives and similar repositories for SoccerAgent
Users that are interested in SoccerAgent are comparing it to the libraries listed below
Sorting:
- [ACL2025 Oral] Evaluate Image/Video Generation like Humans - Fast, Explainable, Flexible☆70Updated 3 weeks ago
- ☆61Updated 4 months ago
- [CVPR2025 Highlight] Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models☆212Updated 2 weeks ago
- [ACL 2025 🔥] Rethinking Step-by-step Visual Reasoning in LLMs☆304Updated last month
- An open source implementation of CLIP (With TULIP Support)☆160Updated 2 months ago
- Pixel-Level Reasoning Model trained with RL☆167Updated 2 weeks ago
- [CVPR 2025] "Towards Universal Soccer Video Understanding".☆172Updated 4 months ago
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvement☆93Updated this week
- Official GPU implementation of the paper "PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance"☆131Updated 8 months ago
- Official code for Paper "Mantis: Multi-Image Instruction Tuning" [TMLR 2024]☆220Updated 3 months ago
- [ICCV 2025] The official repository for "2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining"☆164Updated 4 months ago
- [ECCV'24 Workshops Oral] DALDA: Data Augmentation Leveraging Diffusion Model and LLM with Adaptive Guidance Scaling☆31Updated 8 months ago
- Codes for Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models☆245Updated 8 months ago
- [IEEE VIS 2024] LLaVA-Chart: Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruc…☆68Updated 5 months ago
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment☆51Updated 6 months ago
- CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts☆151Updated last year
- [EMNLP 2024 Oral] MatchTime: Towards Automatic Soccer Game Commentary Generation☆77Updated 6 months ago
- 💡 VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning☆230Updated 3 weeks ago
- [ICLR2025] SPORTU: A Comprehensive Sports Understanding Benchmark for Multimodal Large Language Models☆14Updated 4 months ago
- OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation, arXiv 2024☆60Updated 4 months ago
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture☆206Updated 6 months ago
- A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.☆144Updated 5 months ago
- Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding☆279Updated this week
- VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning☆162Updated last month
- LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning☆140Updated 2 months ago
- ☆68Updated last year
- Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model☆105Updated last week
- This is the official code of VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding (ECCV 2024)☆224Updated 7 months ago
- [NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context☆163Updated 9 months ago
- Official implementation of Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning☆16Updated 8 months ago