arturxe2 / ASTRALinks
PyTorch Implementation of "ASTRA: An Action Spotting TRAnsformer for Soccer Videos", ACM MMSports 2023. | 3rd place solution for SoccerNet Action Spotting Challenge 2023.
☆40Updated last year
Alternatives and similar repositories for ASTRA
Users that are interested in ASTRA are comparing it to the libraries listed below
Sorting:
- Official PyTorch implementation of "No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding"☆31Updated last year
- ☆69Updated last year
- [ECCV'24 Workshops Oral] DALDA: Data Augmentation Leveraging Diffusion Model and LLM with Adaptive Guidance Scaling☆31Updated 9 months ago
- This repo contains the code for our TMLR paper: A Simple Video Segmenter by Tracking Objects Along Axial Trajectories☆27Updated 4 months ago
- PyTorch code for "ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning"☆20Updated 9 months ago
- Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model☆105Updated last month
- [Pattern Recognition 2024] Semantic-Aware Frame-Event Fusion based Pattern Recognition via Large Vision-Language Models, Dong Li, Jiandon…☆18Updated 6 months ago
- Unofficial implementation and experiments related to Set-of-Mark (SoM) 👁️☆87Updated last year
- CVPR 2025 Workshop on CVEU.☆41Updated last month
- Make Your Training Flexible: Towards Deployment-Efficient Video Models☆30Updated last month
- Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model☆68Updated 6 months ago
- VFM-Det: Towards High-Performance Vehicle Detection via Large Foundation Models☆35Updated 4 months ago
- Tracking through Containers and Occluders in the Wild (CVPR 2023) - Official Implementation☆41Updated last year
- [IJCAI'23] Complete Instances Mining for Weakly Supervised Instance Segmentation☆37Updated last year
- INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model☆42Updated last year
- Implementation of the model: "(MC-ViT)" from the paper: "Memory Consolidation Enables Long-Context Video Understanding"☆22Updated last week
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment☆54Updated 2 weeks ago
- Video-LlaVA fine-tune for CinePile evaluation☆51Updated last year
- ☆180Updated 9 months ago
- (AAAI'25) Training-and-pormpt Free General Painterly Image Harmonization Using image-wise attention sharing☆59Updated 7 months ago
- VideoLLM: Modeling Video Sequence with Large Language Models☆158Updated last year
- ☆50Updated last year
- OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation, arXiv 2024☆60Updated 5 months ago
- A benchmark dataset and simple code examples for measuring the perception and reasoning of multi-sensor Vision Language models.☆18Updated 7 months ago
- [ACM Multimedia 2025] "Multi-Agent System for Comprehensive Soccer Understanding"☆31Updated last month
- AAPL: Adding Attributes to Prompt Learning for Vision-Language Models (CVPRw 2024)☆34Updated last year
- 3D Traffic Light & Sign Dataset☆19Updated 4 months ago
- How Good is Google Bard's Visual Understanding? An Empirical Study on Open Challenges☆30Updated last year
- Multiple Transformation Function Estimation for Image Enhancement☆22Updated 9 months ago
- [ICLR 2025] CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion☆48Updated last month