TencentARC/DSR_Suite

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/TencentARC/DSR_Suite)

TencentARC / DSR_Suite

☆74

Alternatives and similar repositories for DSR_Suite

Users that are interested in DSR_Suite are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ShijieZhou-UCLA / VLM4D
View on GitHub
[ICCV 2025] VLM4D: Towards Spatiotemporal Awareness in Vision Language Models
☆55Nov 20, 2025Updated 8 months ago
GVCLab / MLLM-4D
View on GitHub
[ICML 2026] MLLM-4D: Towards Visual-based Spatial-Temporal Intelligence
☆36May 1, 2026Updated 2 months ago
VINHYU / OpenSpatial
View on GitHub
☆92May 8, 2026Updated 2 months ago
TencentARC / MindOmni
View on GitHub
[NeurIPS2025] The official implementation of MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO
☆139Oct 15, 2025Updated 9 months ago
Dynamics-X / Thinking-in-Dynamics
View on GitHub
[CVPR 2026]"Thinking in Dynamics: How Multimodal Large Language Models Perceive, Track, and Reason Dynamics in Physical 4D World"
☆17Jul 7, 2026Updated 2 weeks ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
RobertLuo1 / iccv2023_RVOS_Challenge
View on GitHub
[ICCV 2023 Workshop] The Official Implementation of The First Prize Solution for RVOS Competition
☆14Jan 1, 2024Updated 2 years ago
TencentARC / TimeLens
View on GitHub
[CVPR 2026] TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs
☆162Updated this week
EasonXiao-888 / SpatialEdit
View on GitHub
[Official Repo] SpatialEdit: Benchmarking Fine-Grained Image Spatial Editing
☆214Apr 13, 2026Updated 3 months ago
InternLM / Spatial-SSRL
View on GitHub
[CVPR 2026] Official release of "Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning"
☆133Apr 7, 2026Updated 3 months ago
W-Ted / N3D-VLM
View on GitHub
Official code for paper: N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models
☆117Jan 14, 2026Updated 6 months ago
AntResearchNLP / ViLaSR
View on GitHub
[NeurIPS 2025] Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing
☆98Jul 27, 2025Updated 11 months ago
VITA-Group / VLM-3R
View on GitHub
[CVPR 2026] VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction
☆430Jul 15, 2026Updated last week
SuhZhang / GeoSR
View on GitHub
The code for paper 'Make Geometry Matter for Spatial Reasoning'
☆53Updated this week
Jayce1kk / SpaceVLLM
View on GitHub
SpaceVLLM: Endowing Multimodal Large Language Model with Spatio-Temporal Video Grounding Capability
☆17May 8, 2025Updated last year
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
KlingAIResearch / StereoPilot
View on GitHub
The official implementation of StereoPilot
☆116Dec 19, 2025Updated 7 months ago
Yangr116 / VST
View on GitHub
[ECCV2026] Visual Spatial Tuning
☆200Mar 25, 2026Updated 4 months ago
NVlabs / Long-RL
View on GitHub
Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)
☆727Sep 24, 2025Updated 10 months ago
lyk412 / Consistent123
View on GitHub
[ACMMM 2024] Consistent123: One Image to Highly Consistent 3D Asset Using Case-Aware Diffusion Priors
☆25Oct 22, 2024Updated last year
WanyueZhang-ai / World2VLM
View on GitHub
☆23Apr 30, 2026Updated 2 months ago
LaVi-Lab / Video-3D-LLM
View on GitHub
[CVPR 2025] The code for paper ''Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding''.
☆219Jun 4, 2025Updated last year
XinYu-Andy / SelfE
View on GitHub
Code of SelfE (CVPR 2026)
☆45Mar 16, 2026Updated 4 months ago
Tencent / HaploVLM
View on GitHub
ICML2025
☆63Aug 28, 2025Updated 10 months ago
IIGROUP / SCL
View on GitHub
Seeing What You Miss: Vision-Language Pre-training with Semantic Completion Learning
☆20Dec 21, 2023Updated 2 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
SalesforceAIResearch / strefer
View on GitHub
Strefer: Empowering Video LLMs with Space-Time Referring and Reasoning via Synthetic Instruction Data
☆19Jun 2, 2026Updated last month
KlingAIResearch / VANS
View on GitHub
[CVPR 2026] Video-as-Answer: Predict and Generate Next Video Event with Joint-GRPO
☆119Feb 28, 2026Updated 4 months ago
Li-Hao-yuan / GeoThinker
View on GitHub
☆69Feb 12, 2026Updated 5 months ago
song2yu / SIBench-VSR
View on GitHub
This is a project on visual spatial reasoning tasks-SIBench
☆27Jan 12, 2026Updated 6 months ago
LaVi-Lab / VG-LLM
View on GitHub
The code for paper 'Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors'
☆246Nov 28, 2025Updated 7 months ago
qiujihao19 / LongVideo-R1
View on GitHub
[CVPR 2026] LongVideo-R1: Smart Navigation for Low-cost Long Video Understanding
☆50Jul 7, 2026Updated 2 weeks ago
THU-SI / Spatial-MLLM
View on GitHub
[NeurIPS 2025 Spotlight] Official implementation of Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence
☆480Feb 5, 2026Updated 5 months ago
mll-lab-nu / MindCube
View on GitHub
☆163Mar 23, 2026Updated 4 months ago
Visual-AI / 3DRS
View on GitHub
[NeurIPS 2025] 3DRS: MLLMs Need 3D-Aware Representation Supervision for Scene Understanding
☆158Dec 9, 2025Updated 7 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
ShijieZhou-UCLA / Feature4X
View on GitHub
[CVPR 2025] Feature4X: Bridging Any Monocular Video to 4D Agentic AI with Versatile Gaussian Feature Fields
☆41Oct 18, 2025Updated 9 months ago
InternRobotics / G2VLM
View on GitHub
[CVPR 2026] G2VLM: Geometry Grounded Vision Language Model with Unified 3D Reconstruction and Spatial Reasoning
☆346Apr 18, 2026Updated 3 months ago
CVMI-Lab / Hita
View on GitHub
(ICCV 2025) Holistic Tokenizer for Autoregressive Image Generation
☆34Oct 9, 2025Updated 9 months ago
zhangzaibin / spagent
View on GitHub
SPAgent, a foundation agent for understanding, reasoning over, and operating within the physical and spatial world.
☆208Updated this week
VisualAIKHU / Keyword-DETR
View on GitHub
Official Repository for "Watch Video, Catch Keyword: Context-aware Keyword Attention for Moment Retrieval and Highlight Detection" (AAAI …
☆15Mar 1, 2025Updated last year
marinero4972 / Open-o3-Video
View on GitHub
[ICML 2026] Official implementation of "Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence"
☆157May 1, 2026Updated 2 months ago
daeunni / StreamGaze
View on GitHub
Code for "StreamGaze: Gaze-Guided Temporal Reasoning and Proactive Understanding in Streaming Videos"
☆27May 13, 2026Updated 2 months ago