BytedanceDouyinContent/SAIL-VL2

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/BytedanceDouyinContent/SAIL-VL2)

BytedanceDouyinContent / SAIL-VL2

The SAIL-VL2 series model developed by the BytedanceDouyinContent Group

☆79

Alternatives and similar repositories for SAIL-VL2

Users that are interested in SAIL-VL2 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Kwai-Keye / Keye
View on GitHub
☆808Jun 10, 2026Updated last month
RUCAIBox / Event-Bench
View on GitHub
Official code of *Towards Event-oriented Long Video Understanding*
☆12Jul 26, 2024Updated 2 years ago
marinero4972 / Open-o3-Video
View on GitHub
[ICML 2026] Official implementation of "Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence"
☆157May 1, 2026Updated 2 months ago
mbzuai-oryx / Video-R2
View on GitHub
Video-R2: Reinforcing Consistent and Grounded Reasoning in Multimodal Language Models
☆19Jan 21, 2026Updated 6 months ago
ludc506 / InternVL-X
View on GitHub
☆16Mar 26, 2025Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
hshjerry / VideoEspresso
View on GitHub
[CVPR 2025 Oral] VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
☆140Jul 28, 2025Updated last year
ByteDance-Seed / SAIL
View on GitHub
Implementation for "The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer"
☆85Oct 29, 2025Updated 9 months ago
Vision-CAIR / Infinibench
View on GitHub
Official InfiniBench: A Benchmark for Large Multi-Modal Models in Long-Form Movies and TV Shows
☆20Nov 4, 2025Updated 8 months ago
ByteDance-Seed / Seed1.5-VL
View on GitHub
Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving stat…
☆1,583Jun 14, 2025Updated last year
studio-dots-ai / dots.vlm1
View on GitHub
The official repository of the dots.vlm1 instruct models proposed by rednote-hilab.
☆289Sep 26, 2025Updated 10 months ago
ylingfeng / Add-SD
View on GitHub
Official implementation of Add-SD: Rational Generation without Manual Reference.
☆28Aug 19, 2024Updated last year
SkyworkAI / Skywork-Reward-V2
View on GitHub
Scaling Preference Data Curation via Human-AI Synergy
☆152Jul 3, 2025Updated last year
allenai / SAGE
View on GitHub
[arXiv 2025] SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning
☆70Dec 17, 2025Updated 7 months ago
EvolvingLMMs-Lab / LLaVA-OneVision-2
View on GitHub
Fully Open Framework for Democratized Multimodal Training
☆1,154Updated this week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
TencentARC / Video-Holmes
View on GitHub
[ECCV 2026] Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?
☆95Jul 13, 2025Updated last year
nusnlp / d2vlm
View on GitHub
[ICCV 2025] Factorized Learning for Temporally Grounded Video-Language Models
☆24Apr 18, 2026Updated 3 months ago
allenai / olmix
View on GitHub
☆41May 26, 2026Updated 2 months ago
L-O-I / RRVF
View on GitHub
☆18Aug 7, 2025Updated 11 months ago
Hon-Wong / ByteVideoLLM
View on GitHub
[ICCV 2025] Dynamic-VLM
☆28Dec 16, 2024Updated last year
Time-Search / TimeSearch-R
View on GitHub
[ICLR 2026] Official code for paper: TimeSearch-R: Adaptive Temporal Search for Long-Form Video Understanding via Self-Verification Reinf…
☆27Jan 29, 2026Updated 6 months ago
YihongT / LLMSynthor
View on GitHub
☆21Jul 3, 2025Updated last year
dengandong / GroundMoRe
View on GitHub
☆18May 18, 2026Updated 2 months ago
mit-han-lab / streaming-vlm
View on GitHub
StreamingVLM: Real-Time Understanding for Infinite Video Streams
☆1,048Oct 15, 2025Updated 9 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
THUMAI-Lab / LLaVA-UHD-v4
View on GitHub
☆47Jun 7, 2026Updated last month
lcqysl / FrameThinker
View on GitHub
[ICLR 2026] Official repo for "FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting"
☆50Oct 9, 2025Updated 9 months ago
HuiGuanLab / RaTSG
View on GitHub
This is a repository contains the implementation of our NeurIPS'24 paper "Temporal Sentence Grounding with Relevance Feedback in Videos"
☆13Aug 22, 2025Updated 11 months ago
yannqi / R-4B
View on GitHub
The official repository of "R-4B: Incentivizing General-Purpose Auto-Thinking Capability in MLLMs via Bi-Mode Integration"
☆141Sep 4, 2025Updated 10 months ago
yhzhu99 / FengruCupTemplate
View on GitHub
北航“冯如杯”论文模板 (2022年)
☆12Apr 24, 2022Updated 4 years ago
qzp2018 / UniECS
View on GitHub
Official implement of CIKM2025: 《UniECS: Unified Multimodal E-Commerce Search Framework with Gated Cross-modal Fusion》
☆21Sep 17, 2025Updated 10 months ago
ChangyaoTian / ADDP
View on GitHub
The official implementation of ADDP (ICLR 2024)
☆12Mar 27, 2024Updated 2 years ago
ByteDance-Seed / Seed-1.8
View on GitHub
☆219Dec 19, 2025Updated 7 months ago
zjucsq / PLA
View on GitHub
[ICLR2023] Video Scene Graph Generation from Single-Frame Weak Supervision
☆12Sep 17, 2023Updated 2 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
MoonshotAI / WorldVQA
View on GitHub
☆119Feb 4, 2026Updated 5 months ago
Kimyounggun99 / VRU-Accident
View on GitHub
☆15Nov 17, 2025Updated 8 months ago
xdxie / WAS_WordArt-Segmentation
View on GitHub
The official codes and datasets for Artistic Text Segmentation (ECCV 2024).
☆30Sep 24, 2025Updated 10 months ago
WeitaiKang / SegVG
View on GitHub
[ECCV 2024] SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding
☆63Oct 22, 2024Updated last year
path2generalist / General-Level
View on GitHub
On Path to Multimodal Generalist: General-Level and General-Bench
☆21Jul 11, 2025Updated last year
tengteng95 / Spatial_Ensemble
View on GitHub
☆18Dec 25, 2021Updated 4 years ago
cankocagil / TT-SRN
View on GitHub
TT-SPN: Twin Transformers with Sinusoidal Representation Networks for Video Instance Segmentation
☆16Oct 8, 2021Updated 4 years ago