jyrao / SoccerAgentLinks
[ACM Multimedia 2025] "Multi-Agent System for Comprehensive Soccer Understanding"
β55Updated 3 weeks ago
Alternatives and similar repositories for SoccerAgent
Users that are interested in SoccerAgent are comparing it to the libraries listed below
Sorting:
- [ACL 2025 π₯] Rethinking Step-by-step Visual Reasoning in LLMsβ307Updated 6 months ago
- [CVPR 2025] "Towards Universal Soccer Video Understanding".β195Updated 2 months ago
- This is the official code of VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding (ECCV 2024)β272Updated 11 months ago
- Pixel-Level Reasoning Model trained with RL [NeuIPS25]β250Updated 2 weeks ago
- [ACL2025 Oral & Award] Evaluate Image/Video Generation like Humans - Fast, Explainable, Flexibleβ107Updated 3 months ago
- [CVPR2025 Highlight] Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Modelsβ229Updated 2 weeks ago
- Codes for Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Modelsβ265Updated 3 months ago
- π‘ VideoMind: A Chain-of-LoRA Agent for Long Video Reasoningβ277Updated last month
- [EMNLP 2024 Oral] MatchTime: Towards Automatic Soccer Game Commentary Generationβ87Updated 10 months ago
- [ICLR 2025] SPORTU: A Comprehensive Sports Understanding Benchmark for Multimodal Large Language Modelsβ15Updated 2 months ago
- Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understandingβ291Updated 3 months ago
- [ICCV 2025 Highlight] The official repository for "2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining"β176Updated 8 months ago
- OpenVLThinker: An Early Exploration to Vision-Language Reasoning via Iterative Self-Improvementβ119Updated 4 months ago
- (ICCV2025) Official repository of paper "ViSpeak: Visual Instruction Feedback in Streaming Videos"β40Updated 4 months ago
- Official code for Paper "Mantis: Multi-Image Instruction Tuning" [TMLR 2024]β231Updated 8 months ago
- Long Context Transfer from Language to Visionβ397Updated 8 months ago
- Official GPU implementation of the paper "PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance"β130Updated last year
- [NeurIPS 2024 D&B] VideoGUI: A Benchmark for GUI Automation from Instructional Videosβ48Updated 5 months ago
- β¨β¨[NeurIPS 2025] This is the official implementation of our paper "Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehensiβ¦β343Updated 3 weeks ago
- β61Updated 2 months ago
- [CVPR 2025]Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reactionβ145Updated 8 months ago
- [ICCV 2025] Official Repository of VideoLLaMB: Long Video Understanding with Recurrent Memory Bridgesβ78Updated 8 months ago
- β68Updated 2 months ago
- β¨β¨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Modelsβ162Updated 10 months ago
- β189Updated last year
- An official implementation of "CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning"β142Updated 2 weeks ago
- [IEEE VIS 2024] LLaVA-Chart: Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instrucβ¦β72Updated 10 months ago
- The official repo for "Vidi: Large Multimodal Models for Video Understanding and Editing"β148Updated 2 months ago
- TStar is a unified temporal search framework for long-form video question answeringβ71Updated 2 months ago
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architectureβ211Updated 10 months ago