jacklishufan / LaViDaLinks

Official Implementation of LaViDa: :A Large Diffusion Language Model for Multimodal Understanding

☆174

Alternatives and similar repositories for LaViDa

Users that are interested in LaViDa are comparing it to the libraries listed below

Sorting:

facebookresearch / metamorph
Code for MetaMorph Multimodal Understanding and Generation via Instruction Tuning
☆222Updated 7 months ago
ML-GSAI / LLaDA-V
☆288Updated last month
M-E-AGI-Lab / Muddit
Official Implementation of Muddit [Meissonic II]: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model.
☆95Updated last month
UMass-Embodied-AGI / Mirage
Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (arXiv 2025)
☆197Updated 4 months ago
Yxxxb / VoCo-LLaMA
[CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".
☆195Updated 5 months ago
Fr0zenCrane / UniCoT
Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
☆173Updated last week
Gen-Verse / HermesFlow
[NeurIPS 2025] HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation
☆72Updated 2 months ago
TIGER-AI-Lab / Pixel-Reasoner
Pixel-Level Reasoning Model trained with RL [NeuIPS25]
☆251Updated 3 weeks ago
chenllliang / DnD-Transformer
[ICLR 2025] Source code for paper "A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegr…
☆79Updated 11 months ago
multimodal-reasoning-lab / Bagel-Zebra-CoT
https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT
☆103Updated last month
TencentARC / GRPO-CARE
☆79Updated 5 months ago
bronyayang / Law_of_Vision_Representation_in_MLLMs
[COLM'25] Official implementation of the Law of Vision Representation in MLLMs
☆170Updated last month
yu-rp / Dimple
Dimple, the first Discrete Diffusion Multimodal Large Language Model
☆112Updated 4 months ago
HL-hanlin / Bifrost-1
Official implementation of Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents (NeurIPS 2025)
☆43Updated last week
OpenGVLab / MMIU
[ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
☆90Updated last year
Yangyi-Chen / SOLO
[TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"
☆147Updated last year
TIGER-AI-Lab / VL-Rethinker
The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning" [NeurIPS25]
☆166Updated 6 months ago
TencentARC / MindOmni
☆135Updated last month
zhijie-group / Orthus
☆64Updated 6 months ago
facebookresearch / webssl
Code for "Scaling Language-Free Visual Representation Learning" paper (Web-SSL).
☆189Updated 7 months ago
tulip-berkeley / open_clip
An open source implementation of CLIP (With TULIP Support)
☆163Updated 6 months ago
rese1f / aurora
[ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
☆133Updated 6 months ago
TIGER-AI-Lab / Vamba
Code for the paper "Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers" [ICCV 2025]
☆95Updated 4 months ago
yix8 / VisualPlanning
Visual Planning: Let's Think Only with Images
☆283Updated 6 months ago
DAMO-NLP-SG / DiGIT
[NeurIPS 2024] Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective
☆73Updated last year
Cooperx521 / ScaleCap
Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing’
☆58Updated 5 months ago
Tiezheng11 / Vision-Language-Vision
☆63Updated 4 months ago
TencentARC / SEED-Bench-R1
☆94Updated 5 months ago
NVlabs / QLIP
[arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation
☆94Updated 9 months ago
egolife-ai / Ego-R1
Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning
☆131Updated 3 months ago