zaiquanyang / LLaVA_Next_STVGView external linksLinks
LLaVA-Next for STVG
☆18Dec 5, 2025Updated 2 months ago
Alternatives and similar repositories for LLaVA_Next_STVG
Users that are interested in LLaVA_Next_STVG are comparing it to the libraries listed below
Sorting:
- SpaceVLLM: Endowing Multimodal Large Language Model with Spatio-Temporal Video Grounding Capability☆16May 8, 2025Updated 9 months ago
- Pytorch implementation of the paper 'Gaussian Mixture Proposals with Pull-Push Learning Scheme to Capture Diverse Events for Weakly Super…☆18Jan 19, 2024Updated 2 years ago
- FlashVTG: Feature Layering and Adaptive Score Handling Network for Video Temporal Grounding. (WACV2025)☆34Apr 17, 2025Updated 9 months ago
- Weakly Supervised Video Moment Localisation with Contrastive Negative Sample Mining☆30Apr 4, 2022Updated 3 years ago
- DisTime: Distribution-based Time Representation for Video Large Language Models.☆18Jul 10, 2025Updated 7 months ago
- ☆11Dec 6, 2024Updated last year
- This branch of Asteroid contains code for the vocal harmony and chamber ensemble separation related papers.☆12Nov 7, 2024Updated last year
- ☆13Jul 3, 2024Updated last year
- Frequency tracking in time-frequency representations☆13Jan 19, 2021Updated 5 years ago
- [CVPR 2025] DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text Retrieval☆22Jun 23, 2025Updated 7 months ago
- ☆10Nov 5, 2018Updated 7 years ago
- ☆20Nov 21, 2025Updated 2 months ago
- Empowering Small VLMs to Think with Dynamic Memorization and Exploration☆15Nov 18, 2025Updated 2 months ago
- ☆14Dec 2, 2025Updated 2 months ago
- Time frequency ridge detection based on relevant ridge portions☆11Aug 17, 2023Updated 2 years ago
- ☆11Nov 27, 2025Updated 2 months ago
- [NeurIPS'25] Time-R1: Post-Training Large Vision Language Model for Temporal Video Grounding☆73Dec 14, 2025Updated 2 months ago
- Weakly Supervised Referring Video Object Segmentation with Object-Centric Pseudo-Guidance☆10Aug 17, 2024Updated last year
- This repository contains the official code for "Flexible Biometrics Recognition: Bridging the Multimodality Gap through Attention, Alignm…☆12Oct 9, 2024Updated last year
- A command-line player of Nintendo NES/Famicom music files (.nsf/.nsfe)☆12Jul 14, 2018Updated 7 years ago
- ☆18Nov 10, 2025Updated 3 months ago
- ☆15Sep 11, 2025Updated 5 months ago
- The repo for code, that hasn't been published yet☆14May 14, 2025Updated 9 months ago
- This repository contains the speaker labeled information of VoxCeleb2 and LRS3 audio-visual datasets. (AAAI 2025)☆12Sep 6, 2024Updated last year
- A dark, compact and minimalistic theme for XFCE desktop ( xfm4 windows-manager ).☆10Jul 4, 2016Updated 9 years ago
- [ECCV 2024] The first zero-shot setting for spatio-temporal video grounding.☆11Jul 16, 2024Updated last year
- F-16 is a powerful video large language model (LLM) that perceives high-frame-rate videos, which is developed by the Department of Electr…☆34Jul 3, 2025Updated 7 months ago
- https://avocado-captioner.github.io/☆29Oct 16, 2025Updated 3 months ago
- Unsupported SNES emulator for multicore ARM Cortex A7,A9,A15,A53 Linux platforms.☆11Oct 1, 2024Updated last year
- gorynlich, 2d platform dungeon romp☆13Nov 8, 2021Updated 4 years ago
- Mirror from gitlab☆11Jan 9, 2021Updated 5 years ago
- ☆13May 15, 2025Updated 8 months ago
- Code and data recipes for the paper: Optimal Condition Training for Target Source Separation by Efthymios Tzinis, Gordon Wichern, Paris S…☆14Feb 15, 2023Updated 2 years ago
- A terminal-based renderer for OpenGL shaders. Like Shadertoy, but in the terminal.☆12Sep 24, 2023Updated 2 years ago
- Cross-Speaker Encoding Network for Multi-talker Speech Recognition☆11Mar 14, 2025Updated 11 months ago
- ☆12Dec 26, 2023Updated 2 years ago
- This is a repository contains the implementation of our NeurIPS'24 paper "Temporal Sentence Grounding with Relevance Feedback in Videos"☆13Aug 22, 2025Updated 5 months ago
- [ICTC'24] - "Voice-Based Age and Gender Recognition: A Comparative Study of LSTM, RezoNet and Hybrid CNNs-BiLSTM Architecture" by Nhut Mi…☆10Jan 16, 2025Updated last year
- [ICLR2023] Video Scene Graph Generation from Single-Frame Weak Supervision☆12Sep 17, 2023Updated 2 years ago