mlvlab/ST-VLM

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/mlvlab/ST-VLM)

mlvlab / ST-VLM

☆13

Alternatives and similar repositories for ST-VLM

Users that are interested in ST-VLM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

mlvlab / OVQA
View on GitHub
Open-Vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models (ICCV 20…
☆18Apr 23, 2024Updated 2 years ago
mlvlab / DialogGSR
View on GitHub
Official Implementation (Pytorch) of the "Generative Subgraph Retrieval for Knowledge Graph-Grounded Dialog Generation", EMNLP 2024 (main…
☆12Mar 10, 2025Updated last year
mlvlab / VidChain
View on GitHub
Official Implementation (Pytorch) of the "VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Capti…
☆25Jan 26, 2025Updated last year
mlvlab / VT-TWINS
View on GitHub
Video-Text Representation Learning via Differentiable Weak Temporal Alignment (CVPR 2022)
☆18Apr 19, 2024Updated 2 years ago
mlvlab / DeepVideoR1
View on GitHub
[NeurIPS25] Official Implementation (Pytorch) of "DeepVideo-R1"
☆36Feb 22, 2026Updated 5 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
mlvlab / MELTR
View on GitHub
MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models (CVPR 2023)
☆35Apr 23, 2024Updated 2 years ago
mlvlab / BLiM
View on GitHub
Bidirectional Likelihood Estimation with Multi-Modal Large Language Models for Text-Video Retrieval (ICCV 2025 Highlight)
☆26Aug 1, 2025Updated 11 months ago
mlvlab / PDC
View on GitHub
☆10Apr 19, 2024Updated 2 years ago
mlvlab / Representation-Shift
View on GitHub
Official Implementation (Pytorch) of the "Representation Shift: Unifying Token Compression with FlashAttention", ICCV 2025
☆36Feb 22, 2026Updated 5 months ago
mlvlab / LLaMo
View on GitHub
Official Implementation (Pytorch) of the "LLaMo: Large Language Model-based Molecular Graph Assistant", NeurIPS 2024
☆37Feb 12, 2025Updated last year
mlvlab / DAVI
View on GitHub
Official Implementation (Pytorch) of "DAVI: Diffusion Prior-Based Amortized Variational Inference for Noisy Inverse Problems", ECCV 2024 …
☆75Aug 16, 2024Updated last year
Mozhgan91 / LEO
View on GitHub
LEO: A powerful Hybrid Multimodal LLM
☆20Jan 18, 2025Updated last year
HLR / VLN-trans
View on GitHub
[ACL2023] Official code repository for VLN-Trans
☆14Sep 10, 2023Updated 2 years ago
mlvlab / SpeaQ
View on GitHub
Official PyTorch implementation of "Groupwise Query Specialization and Quality-Aware Multi-Assignment for Transformer-based Visual Relati…
☆41Apr 19, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
taco-group / NuScenes-SpatialQA
View on GitHub
☆19Apr 10, 2025Updated last year
chu0802 / SnD
View on GitHub
This is an official implementation of our work, Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on V…
☆17Sep 24, 2025Updated 10 months ago
Haochen-Wang409 / ross3d
View on GitHub
[ICCV'25] Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness
☆70Jul 22, 2025Updated last year
cilinyan / ReVOS-api
View on GitHub
[ECCV24] VISA: Reasoning Video Object Segmentation via Large Language Model
☆22Jul 20, 2024Updated 2 years ago
krafton-ai / lexico
View on GitHub
KV cache compression via sparse coding
☆17Oct 26, 2025Updated 8 months ago
TimChou-ntu / GSNeRF
View on GitHub
[CVPR 2024] GSNeRF: Generalizable Semantic Neural Radiance Fields with Enhanced 3D Scene Understanding
☆18Jun 10, 2024Updated 2 years ago
ltttpku / CMD-SE-release
View on GitHub
☆22Jun 6, 2024Updated 2 years ago
G-JWLee / TAMP
View on GitHub
☆12May 15, 2025Updated last year
google-research-datasets / egotempo
View on GitHub
☆26Jun 19, 2026Updated last month
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
NavBench / Evaluation_Code
View on GitHub
☆22Feb 8, 2026Updated 5 months ago
paintscene4d / paintscene4d.github.io
View on GitHub
☆25Mar 30, 2025Updated last year
gbif / text-tree
View on GitHub
A simple taxonomic tree format using indented plain text
☆14Jun 5, 2026Updated last month
mlvlab / Flipped-VQA
View on GitHub
Large Language Models are Temporal and Causal Reasoners for Video Question Answering (EMNLP 2023)
☆77Mar 26, 2025Updated last year
zlai0 / TrackTention
View on GitHub
☆26Mar 26, 2025Updated last year
look4u-ok / video-slicer
View on GitHub
☆18Jun 18, 2024Updated 2 years ago
guanw-pku / OED
View on GitHub
Official implementation of paper "OED: Towards One-stage End-to-End Dynamic Scene Graph Generation".
☆30Mar 26, 2024Updated 2 years ago
DirtyHarryLYL / HAKE-AVA
View on GitHub
☆31Mar 5, 2025Updated last year
genforce / PedGen
View on GitHub
[ICLR 2025] Dataset and Code for Paper "Learning to Generate Diverse Pedestrian Movements from Web Videos with Noisy Labels"
☆45Dec 23, 2025Updated 7 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
visipedia / inat_loc
View on GitHub
The iNaturalist Localization Dataset from "On Label Granularity and Object Localization" (ECCV 2022).
☆15Aug 1, 2023Updated 2 years ago
ChocoWu / USG
View on GitHub
This is the project for 'USG'.
☆39Jun 21, 2026Updated last month
patrick-tssn / VideoHallucer
View on GitHub
VideoHallucer, The first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)
☆43Dec 16, 2025Updated 7 months ago
chenchao15 / GridPull
View on GitHub
☆25Oct 5, 2023Updated 2 years ago
qiujihao19 / Artemis
View on GitHub
[NeurIPS 2024] Artemis: Towards Referential Understanding in Complex Videos
☆27Apr 8, 2025Updated last year
Charles-Xie / CQL
View on GitHub
Code for our paper "Category Query Learning for Human-Object Interaction Classification" (CVPR2023)
☆37Jul 9, 2023Updated 3 years ago
jz462 / ContrastiveLosses4VRD
View on GitHub
Implementation for the CVPR2019 paper "Graphical Contrastive Losses for Scene Graph Parsing"
☆12Nov 11, 2019Updated 6 years ago