☆13Mar 28, 2025Updated 11 months ago
Alternatives and similar repositories for ST-VLM
Users that are interested in ST-VLM are comparing it to the libraries listed below
Sorting:
- Open-Vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models (ICCV 20…☆18Apr 23, 2024Updated last year
- [NeurIPS25] Official Implementation (Pytorch) of "DeepVideo-R1"☆31Feb 22, 2026Updated last week
- Official Implementation (Pytorch) of the "Generative Subgraph Retrieval for Knowledge Graph-Grounded Dialog Generation", EMNLP 2024 (main…☆12Mar 10, 2025Updated 11 months ago
- Official Implementation (Pytorch) of the "VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Capti…☆23Jan 26, 2025Updated last year
- ☆18Apr 10, 2025Updated 10 months ago
- MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models (CVPR 2023)☆35Apr 23, 2024Updated last year
- LEO: A powerful Hybrid Multimodal LLM☆19Jan 18, 2025Updated last year
- [ECCV24] VISA: Reasoning Video Object Segmentation via Large Language Model☆19Jul 20, 2024Updated last year
- Video-Text Representation Learning via Differentiable Weak Temporal Alignment (CVPR 2022)☆17Apr 19, 2024Updated last year
- [CVPR 2026] Official release of "Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning"☆112Feb 25, 2026Updated last week
- Official implementation of paper "OED: Towards One-stage End-to-End Dynamic Scene Graph Generation".☆26Mar 26, 2024Updated last year
- ☆25Mar 30, 2025Updated 11 months ago
- ☆26Mar 26, 2025Updated 11 months ago
- [ICCV'25] Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness☆67Jul 22, 2025Updated 7 months ago
- [CVPR 2025] LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding☆82Jul 4, 2025Updated 7 months ago
- ☆31Mar 5, 2025Updated 11 months ago
- Official Implementation (Pytorch) of the "LLaMo: Large Language Model-based Molecular Graph Assistant", NeurIPS 2024☆33Feb 12, 2025Updated last year
- [ICCV'25] HERMES: temporal-coHERent long-forM understanding with Episodes and Semantics☆38Sep 10, 2025Updated 5 months ago
- [NeurIPS 2024] Artemis: Towards Referential Understanding in Complex Videos☆27Apr 8, 2025Updated 10 months ago
- The official implementation of our work Hawkeye: Discovering and Grounding Implicit Anomalous Sentiment in Recon-videos via Scene-enhanc…☆12Oct 14, 2024Updated last year
- The official implementation of "Cross-modal Causal Relation Alignment for Video Question Grounding. (CVPR 2025 Highlight)"☆43Apr 27, 2025Updated 10 months ago
- Official Implementation (Pytorch) of "DAVI: Diffusion Prior-Based Amortized Variational Inference for Noisy Inverse Problems", ECCV 2024 …☆74Aug 16, 2024Updated last year
- [CVPR 2025 🔥]A Large Multimodal Model for Pixel-Level Visual Grounding in Videos☆96Apr 14, 2025Updated 10 months ago
- Finetuning & extending DiffusionDet to video & pedestrian multi-object-tracking☆13Apr 12, 2023Updated 2 years ago
- The repository of VG-Refiner paper☆17Dec 9, 2025Updated 2 months ago
- ☆66Feb 23, 2026Updated last week
- Code for our paper "Category Query Learning for Human-Object Interaction Classification" (CVPR2023)☆37Jul 9, 2023Updated 2 years ago
- ☆47Sep 13, 2024Updated last year
- ☆10Apr 7, 2025Updated 10 months ago
- Official PyTorch implementation of "Groupwise Query Specialization and Quality-Aware Multi-Assignment for Transformer-based Visual Relati…☆41Apr 19, 2024Updated last year
- [ICLR 2025] Dataset and Code for Paper "Learning to Generate Diverse Pedestrian Movements from Web Videos with Noisy Labels"☆45Dec 23, 2025Updated 2 months ago
- ☆11Jan 18, 2025Updated last year
- Implementation for the CVPR2019 paper "Graphical Contrastive Losses for Scene Graph Parsing"☆12Nov 11, 2019Updated 6 years ago
- [CVPR 2024] LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation☆13Jun 17, 2024Updated last year
- A large-scale training and benchmarking framework for rPPG.☆10Nov 26, 2024Updated last year
- ☆10Oct 5, 2022Updated 3 years ago
- DisTime: Distribution-based Time Representation for Video Large Language Models.☆18Jul 10, 2025Updated 7 months ago
- ☀️[ArXiv 2025] PyTorch Implementation of D2GV: Deformable 2D Gaussian Splatting for Video Representation in 400FPS☆57Mar 13, 2025Updated 11 months ago
- ☆13Jan 21, 2025Updated last year