TencentCloudADP/youtu-vl

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/TencentCloudADP/youtu-vl)

TencentCloudADP / youtu-vl

Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision

☆167

Alternatives and similar repositories for youtu-vl

Users that are interested in youtu-vl are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

tencent-ailab / Penguin-VL
View on GitHub
Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders [Technical Report]
☆205Mar 30, 2026Updated 3 months ago
NMM-Roadmap / Awesome-NMM-List
View on GitHub
☆55Jun 3, 2026Updated last month
w1oves / hqclip
View on GitHub
[ICCV 2025] HQ-CLIP: Leveraging Large Vision-Language Models to Create High-Quality Image-Text Datasets
☆67Aug 6, 2025Updated 11 months ago
HVision-NKU / DenseVLM
View on GitHub
[ICCV 2025] Unbiased Region-Language Alignment for Open-Vocabulary Dense Prediction
☆53Sep 22, 2025Updated 10 months ago
WeChatCV / WeDetect
View on GitHub
(CVPR 2026) Official repository of paper "WeDetect: Fast Open-Vocabulary Object Detection as Retrieval"
☆243Jun 7, 2026Updated last month
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
lzyhha / HSSL
View on GitHub
Enhancing Representations through Heterogeneous Self-Supervised Learning (TPAMI 2025)
☆15May 2, 2025Updated last year
marinero4972 / Open-o3-Video
View on GitHub
[ICML 2026] Official implementation of "Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence"
☆157May 1, 2026Updated 2 months ago
facebookresearch / stepdiff
View on GitHub
Data release for Step Differences in Instructional Video (CVPR24)
☆15Jun 19, 2024Updated 2 years ago
ByteVisionLab / NextFlow
View on GitHub
NextFlow🚀: Unified Sequential Modeling Activates Multimodal Understanding and Generation
☆331Jan 9, 2026Updated 6 months ago
IDEA-Research / Rex-Thinker
View on GitHub
[ICLR-2026] Rex-Thinker: Grounded Object Refering via Chain-of-Thought Reasoning
☆150Jun 30, 2025Updated last year
wendell0218 / Janus-Pro-R1
View on GitHub
[NeurIPS 2025] Official repository of the paper "Unlocking Aha Moments via Reinforcement Learning: Advancing Collaborative Visual Compreh…
☆23Sep 27, 2025Updated 9 months ago
FishAndWasabi / Real-LOD
View on GitHub
Offical implementation of "Re-Aligning Language to Visual Objects with an Agentic Workflow"
☆34Apr 20, 2025Updated last year
WeChatCV / ObjEmbed
View on GitHub
(ICML 2026) Official repository of paper "ObjEmbed: Towards Universal Multimodal Object Embeddings"
☆51May 18, 2026Updated 2 months ago
EvolvingLMMs-Lab / LLaVA-OneVision-2
View on GitHub
Fully Open Framework for Democratized Multimodal Training
☆1,147Updated this week
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
TuringEyeTest / TuringEyeTest
View on GitHub
Pixels, Patterns, but no Poetry: To See the World like Humans
☆18Aug 11, 2025Updated 11 months ago
mhh0318 / OneMoreStep
View on GitHub
☆25Nov 30, 2023Updated 2 years ago
UCSC-VLAA / OpenVision
View on GitHub
OpenVision (ICCV 2025), OpenVision 2 (CVPR 2026), and OpenVision 3
☆487Feb 21, 2026Updated 5 months ago
Luo-Yihong / DGPO
View on GitHub
[ICLR 2026][Ultra Fast&Powerful Diffusion RL] Reinforcing Diffusion Models by Direct Group Preference Optimization
☆84May 26, 2026Updated last month
IDEA-Research / Rex-Omni
View on GitHub
[CVPR2026] Detect Anything via Next Point Prediction
☆1,514Feb 22, 2026Updated 5 months ago
HVision-NKU / ASID-Caption
View on GitHub
ASID-Caption: Attribute-Structured and Quality-Verified Audiovisual Instruction Dataset and Training Pipeline for Fine-Grained Video Unde…
☆68Mar 3, 2026Updated 4 months ago
thuml / Reasoning-Visual-World
View on GitHub
Official repository for "Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models", https://arxiv.org/abs/2601.1983…
☆100Mar 9, 2026Updated 4 months ago
Mini-o3 / Mini-o3
View on GitHub
Official Code for "Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search"
☆422Jan 29, 2026Updated 5 months ago
yu-rp / VisualPerceptionToken
View on GitHub
☆136Mar 22, 2025Updated last year
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
ThinkMorph / ThinkMorph
View on GitHub
[ICLR 2026] The official repository for paper "ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning"
☆190May 1, 2026Updated 2 months ago
saccharomycetes / mllms_know
View on GitHub
[ICLR'25] Official code for the paper 'MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs'
☆382Apr 20, 2025Updated last year
EvolvingLMMs-Lab / OneVision-Encoder
View on GitHub
Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence
☆385Jun 20, 2026Updated last month
mlvlab / DeepVideoR1
View on GitHub
[NeurIPS25] Official Implementation (Pytorch) of "DeepVideo-R1"
☆36Feb 22, 2026Updated 5 months ago
lyhisme / DeST
View on GitHub
An official code for "A Decoupled Spatio-Temporal Framework for Skeleton-based Action Segmentation".
☆39Dec 15, 2023Updated 2 years ago
cvlab-kaist / VIRAL
View on GitHub
Official implementation of "VIRAL: Visual Representation Alignment for MLLMs".
☆163Sep 21, 2025Updated 10 months ago
HVision-NKU / MutualForcing
View on GitHub
☆58Apr 28, 2026Updated 2 months ago
debby-0527 / SAM3-I
View on GitHub
Official code and resources for SAM3-I.
☆175Apr 14, 2026Updated 3 months ago
ShareLab-SII / UniAR
View on GitHub
[ICML 2026] The official implementation of paper "Unified Multimodal Autoregressive Modeling with Shared Context—Visual Tokenizer is Key …
☆46Jul 13, 2026Updated last week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
shengliu66 / FractionalReason
View on GitHub
Official github repo for "Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute"
☆17Jun 30, 2025Updated last year
bytedance / UniVR
View on GitHub
☆20Jul 16, 2026Updated last week
D2I-ai / dasd-thinking
View on GitHub
☆105Jan 27, 2026Updated 5 months ago
zhouyiks / CoLVA
View on GitHub
☆44Jul 9, 2025Updated last year
PolyU-ChenLab / ETBench
View on GitHub
👾 E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding (NeurIPS 2024)
☆74Jan 20, 2025Updated last year
zhaochen0110 / Awesome_Think_With_Images
View on GitHub
Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual in…
☆1,494Mar 9, 2026Updated 4 months ago
fudan-zvg / UniUGG
View on GitHub
UniUGG: Unified 3D Understanding and Generation via Geometric-Semantic Encoding. Accepted to ICLR 2026.
☆63Jul 16, 2026Updated last week