OpenEnvision/Awesome-Multimodal-Modeling

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/OpenEnvision/Awesome-Multimodal-Modeling)

OpenEnvision / Awesome-Multimodal-Modeling

Awesome Multimodal Modeling [Covers MLLM, UMM, and NMM]

☆508

Alternatives and similar repositories for Awesome-Multimodal-Modeling

Users that are interested in Awesome-Multimodal-Modeling are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

OpenEnvision / BlogrXiv
View on GitHub
BlogrXiv - AI Research Blog Discovery
☆131Updated this week
OpenEnvision / AutoRubric-as-Reward
View on GitHub
Auto-Rubric as Reward: From Implicit Preference to Explicit Generative Criteria
☆50Updated this week
OpenEnvision / Awesome-Multimodal-Agent
View on GitHub
Awesome Visual Agent
☆19Jul 1, 2026Updated 3 weeks ago
OpenEnvision / WorldFoundry
View on GitHub
Unified World Model Inference & Evaluation Infrastructure
☆274Jul 22, 2026Updated last week
tianshijing / ScalingOpt
View on GitHub
ScalingOpt - Optimization Community
☆104Jun 1, 2026Updated last month
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
OpenRaiser / Envision
View on GitHub
Envision: Benchmarking Unified Understanding & Generation for Causal World Process Insights
☆32Jan 9, 2026Updated 6 months ago
ATH-MaaS / Awesome-Unified-Multimodal-Models
View on GitHub
Awesome Unified Multimodal Models
☆1,306Mar 24, 2026Updated 4 months ago
xie-lab-ml / Meissonic-Inference
View on GitHub
Bag of Design Choices for Inference of High-Resolution Masked Generative Transformer
☆16Nov 21, 2024Updated last year
JinXins / MergeMix
View on GitHub
[ICLR 2026] MergeMix: A Unified Augmentation Paradigm for Visual and Multi-Modal Understanding
☆21Feb 27, 2026Updated 5 months ago
FouierL / EquS
View on GitHub
[WACV 2026]Official Code of the paper “Equivariant Sampling for Improving Diffusion Model-based Image Restoration“
☆19Jan 29, 2026Updated 6 months ago
xie-lab-ml / Mano-Restriking-Manifold-Optimization-for-LLM-Training
View on GitHub
The official code of "Mano: Restriking Manifold Optimization for LLM Training".
☆25Jun 1, 2026Updated last month
Purshow / Awesome-Unified-Multimodal
View on GitHub
📖 This is a repository for organizing papers, codes, and other resources related to unified multimodal models.
☆365Jan 8, 2026Updated 6 months ago
EvolvingLMMs-Lab / Evolving-Visual-Generation
View on GitHub
[Roadmap] Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling
☆125Jun 9, 2026Updated last month
Alrightlone / SparAlloc
View on GitHub
SparAlloc: A Simple and Modular Framework for Decoupled Sparsity Allocation in Layerwise Pruning for LLM
☆16Jun 5, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
yunzeliu / awesome-unified-embedding
View on GitHub
A curated list of papers, models, datasets, and benchmarks for unified multi-modal embedding models.
☆43Apr 29, 2026Updated 3 months ago
OpenDCAI / DataFlow-MM
View on GitHub
Dataflow-MM, multi-media operators for Dataflow. We aim to prepare data for Multimodal Large Language Models.
☆49Apr 13, 2026Updated 3 months ago
Gen-Verse / WideRange4D
View on GitHub
WideRange4D: Enabling High-Quality 4D Reconstruction with Wide-Range Movements and Scenes
☆111Mar 19, 2025Updated last year
thinkwee / AwesomeOPD
View on GitHub
Awesome List for On-Policy Distillation
☆773Updated this week
knightnemo / Awesome-World-Models
View on GitHub
A Curated List of Awesome Works in World Modeling, Aiming to Serve as a One-stop Resource for Researchers, Practitioners, and Enthusiasts…
☆3,244Updated this week
mm-vl / ULM-R1
View on GitHub
Co-Reinforcement Learning for Unified Multimodal Understanding and Generation
☆48Jul 22, 2025Updated last year
showlab / Awesome-Unified-Multimodal-Models
View on GitHub
📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.
☆830Oct 10, 2025Updated 9 months ago
worldbench / awesome-spatial-intelligence
View on GitHub
🌐 Forging Spatial Intelligence: A Roadmap of Multi-Modal Data Pre-Training for Autonomous Systems
☆150Jul 12, 2026Updated 2 weeks ago
leeruibin / hybrid-forcing
View on GitHub
☆32Apr 29, 2026Updated 3 months ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
Sphere-AI-Lab / fda
View on GitHub
Implementation of <Model Merging with Functional Dual Anchors>
☆46Nov 23, 2025Updated 8 months ago
zhaochen0110 / OpenThinkIMG
View on GitHub
OpenThinkIMG is an end-to-end open-source framework that empowers LVLMs to think with images.
☆399Jun 1, 2025Updated last year
PKU-YuanGroup / LLMBind
View on GitHub
LLMBind: A Unified Modality-Task Integration Framework
☆19Jun 16, 2024Updated 2 years ago
agents-x-project / TIR-Bench
View on GitHub
[ECCV 2026] Official implementation of "TIR-Bench: A Comprehensive Benchmark for Agentic Thinking-with-Images Reasoning"
☆25Feb 8, 2026Updated 5 months ago
facebookresearch / tuna-2
View on GitHub
Official implementation of Tuna-2: Pixel Embeddings Beat Vision Encoders for Unified Understanding and Generation
☆739Jul 22, 2026Updated last week
simchowitzlabpublic / nano-world-model
View on GitHub
A Minimalist, Batteries-included Repository for Advancing World Model Science.
☆697Jun 15, 2026Updated last month
doem97 / ICLR26_mtLoRA
View on GitHub
[ICLR 2026] Official implementation (Claude Agent reproduce supported) of paper "mtLoRA: Scalable Multi-Task Low-Rank Model Adaptation" +…
☆17Mar 4, 2026Updated 4 months ago
FYYDCC / IVT-LR
View on GitHub
Official repository for “Reasoning in the Dark: Interleaved Vision-Text Reasoning in Latent Space”
☆18Jan 27, 2026Updated 6 months ago
Sphere-AI-Lab / poet
View on GitHub
Implementation for POET and POET-X for LLM pretraining
☆38Jun 9, 2026Updated last month
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
pengsida / learning_research
View on GitHub
本人的科研经验
☆13,496Jun 6, 2026Updated last month
meituan-longcat / LongCat-Next
View on GitHub
☆464Jul 21, 2026Updated last week
stepfun-ai / NextStep-1
View on GitHub
[🚀 ICLR 2026 Oral] NextStep-1: SOTA Autogressive Image Generation with Continuous Tokens. A research project developed by the StepFun’s …
☆693Feb 27, 2026Updated 5 months ago
solaris-wm / solaris
View on GitHub
The first multiplayer video world model in Minecraft
☆220Mar 3, 2026Updated 4 months ago
mll-lab-nu / Awesome-Spatial-Intelligence-in-VLM
View on GitHub
A paper list for spatial reasoning
☆767Jan 19, 2026Updated 6 months ago
lhxcs / DVD-Quant
View on GitHub
☆17Oct 5, 2025Updated 9 months ago
Alrightlone / OBS-Diff
View on GitHub
[ICLR 2026] Offical implementation of "OBS-Diff".
☆66Mar 5, 2026Updated 4 months ago