nusnlp/d2vlm

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/nusnlp/d2vlm)

nusnlp / d2vlm

[ICCV 2025] Factorized Learning for Temporally Grounded Video-Language Models

☆24

Alternatives and similar repositories for d2vlm

Users that are interested in d2vlm are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ChanglongJiangGit / A2J-Transformer-Plus
View on GitHub
[TPAMI 2026] Code for paper "3D Hand Pose Estimation via Articulated Anchor-to-Joint 3D Local Regressors"
☆19Jan 19, 2026Updated 6 months ago
nusnlp / SlideTailor
View on GitHub
[AAAI 2026] SlideTailor: Personalized Presentation Slide Generation for Scientific Papers
☆57Apr 18, 2026Updated 3 months ago
jinfanggan / DeFB
View on GitHub
(AAAI2026) DeFB: Decomposed Feature Learning for Real-Time Multi-Person Eyeblink Detection in Untrimmed In-the-Wild Videos
☆15Mar 21, 2026Updated 4 months ago
showlab / MovieSeq
View on GitHub
[ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences
☆46Mar 11, 2025Updated last year
ZijiaLewisLu / CVPR2025-DeCafNet
View on GitHub
Official Repo for CVPR 2025 Paper -- DeCafNet: Delegate and Conquer for Efficient Temporal Grounding in Long Videos
☆17Mar 16, 2026Updated 4 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
minghangz / OnVTG
View on GitHub
Online video temporal grounding
☆16Oct 20, 2025Updated 9 months ago
Ranking-VMR / SPR
View on GitHub
☆13Jun 11, 2026Updated last month
wenzhengzeng / MPEblink
View on GitHub
[CVPR 2023] Real-time Multi-person Eyeblink Detection in the Wild for Untrimmed Video
☆74Apr 23, 2026Updated 3 months ago
ii-research / RAG_Overview
View on GitHub
TBA
☆15Aug 19, 2025Updated 11 months ago
zjuruizhechen / TVG-R1
View on GitHub
[EMNLP 2025 Industry] Datasets and Recipes for Video Temporal Grounding via Reinforcement Learning
☆36Oct 22, 2025Updated 9 months ago
EdenGabriel / TaskWeave
View on GitHub
[CVPR 2024 Accepted] TaskWeave: Decoupling and Inter-Task Feedback for Joint Moment Retrieval and Highlight Detection
☆30Sep 26, 2024Updated last year
ChanglongJiangGit / A2J-Transformer
View on GitHub
[CVPR 2023] Code for paper 'A2J-Transformer: Anchor-to-Joint Transformer Network for 3D Interacting Hand Pose Estimation from a Single RG…
☆108Jan 19, 2026Updated 6 months ago
SHI-Labs / Slow-Fast-Video-Multimodal-LLM
View on GitHub
☆29Apr 8, 2025Updated last year
bytedance / F-16
View on GitHub
F-16 is a powerful video large language model (LLM) that perceives high-frame-rate videos, which is developed by the Department of Electr…
☆40Jul 3, 2025Updated last year
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
OpenGVLab / TPO
View on GitHub
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
☆65Jul 22, 2025Updated last year
yhy-2000 / MomentSeeker
View on GitHub
☆23Jul 23, 2025Updated last year
yuanc3 / DATE
View on GitHub
Use 2 lines to empower absolute time awareness for Qwen2.5VL's MRoPE
☆29Sep 20, 2025Updated 10 months ago
marinero4972 / VideoZeroBench
View on GitHub
Official implementation of "VideoZeroBench: Probing the Limits of Video MLLMs with Spatio-Temporal Evidence Verification"
☆21May 7, 2026Updated 2 months ago
HKUST-LongGroup / DyME
View on GitHub
[ICLR 2026] Empowering Small VLMs to Think with Dynamic Memorization and Exploration
☆18Mar 18, 2026Updated 4 months ago
showlab / FocusUI
View on GitHub
[CVPR 2026] FocusUI: Efficient UI Grounding via Position-Preserving Visual Token Selection
☆35Jun 7, 2026Updated last month
iLearn-Lab / MM23-RTQ
View on GitHub
ACM Multimedia 2023 (Oral) - RTQ: Rethinking Video-language Understanding Based on Image-text Model
☆15Apr 7, 2026Updated 3 months ago
VITA-Group / TTC-Net
View on GitHub
[ICML'26] Beyond Test-Time Memory: State-Space Optimal Control for LLM Reasoning
☆15Jun 1, 2026Updated last month
XenoZLH / Shuffle-R1
View on GitHub
Official code repository of Shuffle-R1
☆26Feb 23, 2026Updated 5 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
HuiGuanLab / RaTSG
View on GitHub
This is a repository contains the implementation of our NeurIPS'24 paper "Temporal Sentence Grounding with Relevance Feedback in Videos"
☆13Aug 22, 2025Updated 11 months ago
Time-Search / TimeSearch-R
View on GitHub
[ICLR 2026] Official code for paper: TimeSearch-R: Adaptive Temporal Search for Long-Form Video Understanding via Self-Verification Reinf…
☆27Jan 29, 2026Updated 5 months ago
gyxxyg / TRACE
View on GitHub
[ICLR 2025] TRACE: Temporal Grounding Video LLM via Casual Event Modeling
☆156Aug 22, 2025Updated 11 months ago
Lzq5 / UniTime
View on GitHub
Universal Video Temporal Grounding with Generative Multi-modal Large Language Models
☆56May 20, 2026Updated 2 months ago
TIGER-AI-Lab / VideoEval-Pro
View on GitHub
VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation [TMLR26]
☆15Jun 1, 2026Updated last month
OpenGVLab / VRBench
View on GitHub
[ICCV 2025] A Benchmark for Multi-Step Reasoning in Long Narrative Videos
☆28Jun 4, 2026Updated last month
TencentARC / TimeLens
View on GitHub
[CVPR 2026] TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs
☆162Updated this week
Yarayx / livelongbench
View on GitHub
The first spoken long-text dataset derived from live streams, designed to reflect the redundancy-rich and conversational nature of real-w…
☆12Jun 28, 2025Updated last year
yeliudev / R2-Tuning
View on GitHub
🌀 R2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding (ECCV 2024)
☆91Jul 2, 2024Updated 2 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
JPShi12 / VideoLoom
View on GitHub
[ICML 2026] VideoLoom: A Video Large Language Model for Joint Spatial-Temporal Understanding
☆27Jul 3, 2026Updated 3 weeks ago
zjucsq / PLA
View on GitHub
[ICLR2023] Video Scene Graph Generation from Single-Frame Weak Supervision
☆12Sep 17, 2023Updated 2 years ago
marinero4972 / Open-o3-Video
View on GitHub
[ICML 2026] Official implementation of "Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence"
☆157May 1, 2026Updated 2 months ago
yangjie-cv / WeThink
View on GitHub
WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement Learning
☆36Jun 10, 2025Updated last year
lcqysl / FrameThinker
View on GitHub
[ICLR 2026] Official repo for "FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting"
☆50Oct 9, 2025Updated 9 months ago
GeWu-Lab / TSPM
View on GitHub
Official repository for "Boosting Audio Visual Question Answering via Key Semantic-Aware Cues" in ACM MM 2024.
☆17Oct 25, 2024Updated last year
Jialuo-Li / DIG
View on GitHub
[CVPR 2026] Divide, then Ground: Adapting Frame Selection to Query Types for Long-Form Video Understanding
☆21Feb 21, 2026Updated 5 months ago