LLaVA-Next for STVG
☆18Dec 5, 2025Updated 3 months ago
Alternatives and similar repositories for LLaVA_Next_STVG
Users that are interested in LLaVA_Next_STVG are comparing it to the libraries listed below
Sorting:
- SpaceVLLM: Endowing Multimodal Large Language Model with Spatio-Temporal Video Grounding Capability☆16May 8, 2025Updated 9 months ago
- Pytorch implementation of the paper 'Gaussian Mixture Proposals with Pull-Push Learning Scheme to Capture Diverse Events for Weakly Super…☆19Jan 19, 2024Updated 2 years ago
- FlashVTG: Feature Layering and Adaptive Score Handling Network for Video Temporal Grounding. (WACV2025)☆34Apr 17, 2025Updated 10 months ago
- Weakly Supervised Video Moment Localisation with Contrastive Negative Sample Mining☆30Apr 4, 2022Updated 3 years ago
- Frequency tracking in time-frequency representations☆13Jan 19, 2021Updated 5 years ago
- DisTime: Distribution-based Time Representation for Video Large Language Models.☆19Jul 10, 2025Updated 7 months ago
- (CVPR 2026) Long-RVOS: A Comprehensive Benchmark for Long-term Referring Video Object Segmentation☆27Updated this week
- ☆11Dec 6, 2024Updated last year
- ☆13Jul 3, 2024Updated last year
- This branch of Asteroid contains code for the vocal harmony and chamber ensemble separation related papers.☆12Nov 7, 2024Updated last year
- [CVPR 2025] DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text Retrieval☆21Jun 23, 2025Updated 8 months ago
- ☆13Aug 7, 2025Updated 6 months ago
- gorynlich, 2d platform dungeon romp☆13Nov 8, 2021Updated 4 years ago
- ☆20Nov 21, 2025Updated 3 months ago
- Adaptive Multimodal Reasoning via Reinforcement Learning☆23Jan 11, 2026Updated last month
- A command-line player of Nintendo NES/Famicom music files (.nsf/.nsfe)☆12Jul 14, 2018Updated 7 years ago
- Empowering Small VLMs to Think with Dynamic Memorization and Exploration☆15Nov 18, 2025Updated 3 months ago
- ☆13May 15, 2025Updated 9 months ago
- Mirror from gitlab☆11Jan 9, 2021Updated 5 years ago
- ☆20Nov 10, 2025Updated 3 months ago
- This repository contains the speaker labeled information of VoxCeleb2 and LRS3 audio-visual datasets. (AAAI 2025)☆13Sep 6, 2024Updated last year
- ☆10Nov 5, 2018Updated 7 years ago
- ☆11Nov 27, 2025Updated 3 months ago
- The repo for code, that hasn't been published yet☆14May 14, 2025Updated 9 months ago
- This repository contains the official code for "Flexible Biometrics Recognition: Bridging the Multimodality Gap through Attention, Alignm…☆11Oct 9, 2024Updated last year
- Unsupported SNES emulator for multicore ARM Cortex A7,A9,A15,A53 Linux platforms.☆11Oct 1, 2024Updated last year
- A dark, compact and minimalistic theme for XFCE desktop ( xfm4 windows-manager ).☆10Jul 4, 2016Updated 9 years ago
- ☆15Dec 2, 2025Updated 3 months ago
- [ECCV 2024] The first zero-shot setting for spatio-temporal video grounding.☆11Jul 16, 2024Updated last year
- A terminal-based renderer for OpenGL shaders. Like Shadertoy, but in the terminal.☆12Sep 24, 2023Updated 2 years ago
- ☆14Sep 11, 2025Updated 5 months ago
- Weakly Supervised Referring Video Object Segmentation with Object-Centric Pseudo-Guidance☆10Aug 17, 2024Updated last year
- Code and data recipes for the paper: Optimal Condition Training for Target Source Separation by Efthymios Tzinis, Gordon Wichern, Paris S…☆14Feb 15, 2023Updated 3 years ago
- Time frequency ridge detection based on relevant ridge portions☆11Aug 17, 2023Updated 2 years ago
- [NeurIPS'25] Time-R1: Post-Training Large Vision Language Model for Temporal Video Grounding☆79Dec 14, 2025Updated 2 months ago
- Official implementation of SBNet as described in "Single-branch Network for Multimodal Training".☆12Aug 28, 2023Updated 2 years ago
- [ECCV 2024] Official PyTorch implementation of "Classification Matters: Improving Video Action Detection with Class-Specific Attention"☆17Nov 8, 2024Updated last year
- [NAACL 2025🔥] MEDA: Dynamic KV Cache Allocation for Efficient Multimodal Long-Context Inference☆18Jun 19, 2025Updated 8 months ago
- ☆11Nov 5, 2025Updated 4 months ago