zaiquanyang/LLaVA_Next_STVG

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/zaiquanyang/LLaVA_Next_STVG)

zaiquanyang / LLaVA_Next_STVG

LLaVA-Next for STVG

☆21

Alternatives and similar repositories for LLaVA_Next_STVG

Users that are interested in LLaVA_Next_STVG are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

minghangz / OnVTG
View on GitHub
Online video temporal grounding
☆16Oct 20, 2025Updated 9 months ago
marinero4972 / Open-o3-Video
View on GitHub
[ICML 2026] Official implementation of "Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence"
☆157May 1, 2026Updated 2 months ago
minjoong507 / Consistency-of-Video-LLM
View on GitHub
[CVPR 2025] Official Repository of the paper "On the Consistency of Video Large Language Models in Temporal Comprehension"
☆16Oct 13, 2025Updated 9 months ago
sunoh-kim / pps
View on GitHub
Pytorch implementation of the paper 'Gaussian Mixture Proposals with Pull-Push Learning Scheme to Capture Diverse Events for Weakly Super…
☆19Jan 19, 2024Updated 2 years ago
aiha-lab / InfiniPot-V
View on GitHub
[NeurIPS 25] InfiniPot-V: Memory-Constrained KV Cache Compression for Streaming Video Understanding
☆20Jan 25, 2026Updated 6 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Jayce1kk / SpaceVLLM
View on GitHub
SpaceVLLM: Endowing Multimodal Large Language Model with Spatio-Temporal Video Grounding Capability
☆17May 8, 2025Updated last year
Ziyang412 / Video-RTS
View on GitHub
Code for EMNLP25 paper "Video-RTS: Rethinking Reinforcement Learning and Test-Time Scaling for Efficient and Enhanced Video Reasoning"
☆24Feb 18, 2026Updated 5 months ago
minghangz / cnm
View on GitHub
Weakly Supervised Video Moment Localisation with Contrastive Negative Sample Mining
☆31Apr 4, 2022Updated 4 years ago
V-STaR-Bench / V-STaR
View on GitHub
Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning
☆45Mar 2, 2026Updated 4 months ago
Zhuo-Cao / FlashVTG
View on GitHub
FlashVTG: Feature Layering and Adaptive Score Handling Network for Video Temporal Grounding. (WACV2025)
☆39Apr 17, 2025Updated last year
xiaomi-research / timeviper
View on GitHub
[CVPR'26] TimeViper: A Hybrid Mamba-Transformer Vision-Language Model for Efficient Long Video Understanding
☆25Jan 4, 2026Updated 6 months ago
Dmmm1997 / InstanceVG
View on GitHub
[TPAMI2025] Improving Generalized Visual Grounding with Instance-aware Joint Learning
☆33Apr 28, 2026Updated 3 months ago
yongliang-wu / NumPro
View on GitHub
[CVPR2025] Number it: Temporal Grounding Videos like Flipping Manga
☆150Jan 19, 2026Updated 6 months ago
Xianqi-Zhang / FLAM
View on GitHub
☆11Nov 27, 2025Updated 8 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
hqhQAQ / Hint-GRPO
View on GitHub
[ICCV 2025] Boosting MLLM Reasoning with Text-Debiased Hint-GRPO
☆48Jul 1, 2025Updated last year
jy0205 / STCAT
View on GitHub
[NeurIPS 2022] Embracing Consistency: A One-Stage Approach for Spatio-Temporal Video Grounding
☆54Mar 5, 2024Updated 2 years ago
zsgvivo / VideoZoomer
View on GitHub
☆34Feb 12, 2026Updated 5 months ago
Lzq5 / UniTime
View on GitHub
Universal Video Temporal Grounding with Generative Multi-modal Large Language Models
☆56May 20, 2026Updated 2 months ago
SimpleVQA / SimpleVQA
View on GitHub
SimpleVQA: Multimodal Factuality Evaluation for Multimodal Large Language Models
☆15Feb 20, 2025Updated last year
maifoundations / Streamo
View on GitHub
Streaming Video Instruction Tuning
☆83Feb 25, 2026Updated 5 months ago
Hui-design / TSPO
View on GitHub
[AAAI 2026] ✨ TSPO: Temporal Sampling Policy Optimization for Long-form Video Language Understanding
☆131Nov 12, 2025Updated 8 months ago
hmxiong / StreamChat
View on GitHub
Official repo for "Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge" ICLR2025
☆111Mar 14, 2025Updated last year
zjuruizhechen / TVG-R1
View on GitHub
[EMNLP 2025 Industry] Datasets and Recipes for Video Temporal Grounding via Reinforcement Learning
☆36Oct 22, 2025Updated 9 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
jinpeng0528 / BalConpas
View on GitHub
Code release for "Strike a Balance in Continual Panoptic Segmentation" (ECCV 2024)
☆14Mar 14, 2025Updated last year
marukosan93 / ORPDAD
View on GitHub
This is the official code repository of our dataset and ECCV 2024 paper entitled "Oulu Remote-photoplethysmography Physical Domain Attac…
☆14Jul 9, 2025Updated last year
gaostar123 / DeViL
View on GitHub
[ACM MM 2026] Detector-Empowered Video Large Language Model for Efficient Spatio-Temporal Grounding
☆27Jul 12, 2026Updated 2 weeks ago
xxayt / MGSV
View on GitHub
[ICCV 2025] This repo is the official implementation of "Music Grounding by Short Video"
☆27Sep 9, 2025Updated 10 months ago
VUT-HFUT / MAC_2024_baseline
View on GitHub
[MAC 2024] The baseline code for MAC 2024.
☆12Jun 3, 2025Updated last year
hellowangqian / UDA-norm-VAE
View on GitHub
Pytorch implementation for the paper: Data augmentation with norm-AE and selective pseudo-labelling for unsupervised domain adaptation
☆14Mar 23, 2023Updated 3 years ago
xiaomi-research / time-r1
View on GitHub
[NeurIPS'25] Time-R1: Post-Training Large Vision Language Model for Temporal Video Grounding
☆95Dec 14, 2025Updated 7 months ago
nianfd / RWKV-VG
View on GitHub
☆10Dec 3, 2024Updated last year
ayanglab / AIIB
View on GitHub
☆10May 10, 2024Updated 2 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
BetterZH / SEVLM-code
View on GitHub
Training A Small Emotional Vision Language Model for Visual Art Comprehension
☆17Jul 26, 2024Updated 2 years ago
sunye23 / SAMA
View on GitHub
[NeurIPS 2025] SAMA: Towards Multi-Turn Referential Grounded Video Chat with Large Language Models.
☆17May 26, 2026Updated 2 months ago
EndoluminalSurgicalVision-IMR / DGCI
View on GitHub
[MICCAI 2024] Implicit Representation Embraces Challenging Attributes of Pulmonary Airway Tree Structures
☆14Nov 13, 2024Updated last year
Beckschen / spatialcode
View on GitHub
Open studio for "Thinking with Spatial Code" (https://arxiv.org/pdf/2603.05591)
☆20Mar 18, 2026Updated 4 months ago
saurjya / EnsembleSep
View on GitHub
This branch of Asteroid contains code for the vocal harmony and chamber ensemble separation related papers.
☆12Nov 7, 2024Updated last year
WHB139426 / Grounded-Video-LLM
View on GitHub
[EMNLP 2025 Findings] Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models
☆149Aug 21, 2025Updated 11 months ago
tanvir-utexas / PaPr
View on GitHub
☆13Jul 3, 2024Updated 2 years ago