MajorDavidZhang/Generalization_unified_VLM

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/MajorDavidZhang/Generalization_unified_VLM)

MajorDavidZhang / Generalization_unified_VLM

☆24

Alternatives and similar repositories for Generalization_unified_VLM

Users that are interested in Generalization_unified_VLM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

arctanxarc / GENIUS
View on GitHub
☆42May 9, 2026Updated 2 months ago
TuringEyeTest / TuringEyeTest
View on GitHub
Pixels, Patterns, but no Poetry: To See the World like Humans
☆18Aug 11, 2025Updated 11 months ago
magic-research / vector_quantization
View on GitHub
[NeurIPS 2024] Image Understanding Makes for A Good Tokenizer for Image Generation
☆21Dec 17, 2024Updated last year
wangf3014 / VTok
View on GitHub
Official implementation of VTok: A Unified Video Tokenizer with Decoupled Spatial-Temporal Latents
☆15Feb 5, 2026Updated 5 months ago
hwanyu112 / VIBE-Benchmark
View on GitHub
☆27Feb 3, 2026Updated 5 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
yuleiqin / RAIF
View on GitHub
A Recipe for Building LLM Reasoners to Solve Complex Instructions
☆32Oct 9, 2025Updated 9 months ago
QC-LY / UiG
View on GitHub
Code for "Understanding-in-Generation:Reinforcing Generative Capability of Unified Model via Infusing Understanding into Generation"
☆15Nov 11, 2025Updated 8 months ago
huawei-lin / VTBench
View on GitHub
This repository provides the official implementation of VTBench, a benchmark designed to evaluate the performance of visual tokenizers (V…
☆35Jul 30, 2025Updated 11 months ago
wusize / OpenUni
View on GitHub
☆189Jun 27, 2025Updated last year
hkust-nlp / model-task-align-rl
View on GitHub
[ICLR 26] The official code repository for the paper "Mirage or Method? How Model–Task Alignment Induces Divergent RL Conclusions".
☆18Feb 9, 2026Updated 5 months ago
AgenticIR-Lab / OThink-R1
View on GitHub
This is the official code for OThink-R1 project.
☆21Jun 19, 2025Updated last year
yuexy / ST-AR
View on GitHub
☆14Sep 22, 2025Updated 10 months ago
Fr0zenCrane / UniCoT
View on GitHub
[ICLR 2026] Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
☆233May 31, 2026Updated last month
wusize / Harmon
View on GitHub
[ICCV2025]Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
☆191May 21, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
zhentao-zou / MURE
View on GitHub
Beyond Textual CoT: Interleaved Text-image chains with Deep Confidence Reasoning for Image Editing
☆19Jun 24, 2026Updated 3 weeks ago
PKU-YuanGroup / UniSandBox
View on GitHub
Does Understanding Inform Generation in Unified Multimodal Models? From Analysis to Path Forward
☆60Nov 27, 2025Updated 7 months ago
SxJyJay / UniToken
View on GitHub
[CVPRW 2025] UniToken is an auto-regressive generation model that combines discrete and continuous representations to process visual inpu…
☆106Apr 23, 2025Updated last year
visual-gen / semanticist
View on GitHub
(ICCV 2025) "Principal Components" Enable A New Language of Images
☆86Jun 4, 2026Updated last month
fudan-zvg / UniUGG
View on GitHub
UniUGG: Unified 3D Understanding and Generation via Geometric-Semantic Encoding. Accepted to ICLR 2026.
☆63Updated this week
HumanMLLM / ViSpeak
View on GitHub
(ICCV2025) Official repository of paper "ViSpeak: Visual Instruction Feedback in Streaming Videos"
☆53Jul 1, 2025Updated last year
zhouyiks / CoLVA
View on GitHub
☆44Jul 9, 2025Updated last year
multimodal-reasoning-lab / Bagel-Zebra-CoT
View on GitHub
https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT
☆137Jan 30, 2026Updated 5 months ago
NVlabs / FRAG
View on GitHub
☆15Apr 25, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
wren93 / tuna
View on GitHub
☆94Apr 29, 2026Updated 2 months ago
showlab / UniRL
View on GitHub
The code repository of UniRL
☆53May 30, 2025Updated last year
chengzu-li / MVoT
View on GitHub
Imagine While Reasoning in Space: Multimodal Visualization-of-Thought (ICML 2025)
☆78Apr 12, 2025Updated last year
sen-ye / R3
View on GitHub
[ICLR26] Understanding VS. Generation: Navigating Optimization Dilemma in Multimodal Models
☆25May 6, 2026Updated 2 months ago
zhu733756 / searchengine
View on GitHub
元搜索引擎 searchengine 元数据元搜索
☆15Jul 19, 2020Updated 6 years ago
TencentARC / SEED-Bench-R1
View on GitHub
☆100Jun 23, 2025Updated last year
HorizonWind2004 / reconstruction-alignment
View on GitHub
[ICLR 2026] Official repo of paper "Reconstruction Alignment Improves Unified Multimodal Models". Unlocking the Massive Zero-shot Potenti…
☆410May 23, 2026Updated last month
Osilly / Interleaving-Reasoning-Generation
View on GitHub
[ICLR 2026] This is an early exploration to introduce Interleaving Reasoning to Text-to-image Generation field and achieve the SoTA bench…
☆100Jan 26, 2026Updated 5 months ago
hithqd / ReasonBrain
View on GitHub
【ICML2026】Reasoning to Edit: Hypothetical Instruction-Based Image Editing with Visual Reasoning
☆27May 18, 2026Updated 2 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
nicholasly / HDP-Net
View on GitHub
Test Demo for “HDP-Net: Haze Density Prediction Network for Nighttime Dehazing” PCM 2018
☆12Sep 24, 2018Updated 7 years ago
mfandre / GanttEcharts
View on GitHub
Gantt Chart using echarts
☆13Mar 31, 2021Updated 5 years ago
Theia-4869 / MoSA
View on GitHub
Official code of MoSA (Mixture of Sparse Adapters).
☆13Dec 14, 2023Updated 2 years ago
ErikZ719 / CoTA
View on GitHub
[ICLR 26] Context Tokens are Anchors: Understanding the Repeat Curse in dMLLMs from an Information Flow Perspective
☆16Mar 6, 2026Updated 4 months ago
jiaming-zhou / Zero-WAM
View on GitHub
Zero-WAM, an in-context world model for zero-shot robotic task generalization
☆31Jul 8, 2026Updated 2 weeks ago
onemsg / awesome-project
View on GitHub
大学期间做的各样项目，有Java/Python/JavaScript/Vert.X/SpringBoot
☆10Feb 28, 2022Updated 4 years ago
univ-esuty / ambifusion
View on GitHub
Official repository for the paper ''ambigram generation by a diffusion model''.
☆17Aug 9, 2023Updated 2 years ago