apple/ml-unigen

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/apple/ml-unigen)

apple / ml-unigen

UniGen: Enhanced Training & Test-Time Strategies for Unified Multimodal Understanding and Generation

☆43

Alternatives and similar repositories for ml-unigen

Users that are interested in ml-unigen are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

wdrink / OpenTokenizer
View on GitHub
☆21Jan 17, 2025Updated last year
MengLcool / SliMM
View on GitHub
☆25Dec 26, 2024Updated last year
OdedH / textual-pca
View on GitHub
Official implementation of "Describing Sets of Images with Textual-PCA".
☆16Feb 13, 2023Updated 3 years ago
wdrink / RepWAM
View on GitHub
Code for RepWAM: World Action Modeling with Representation Visual-Action Tokenizers
☆57Jun 14, 2026Updated last month
wdrink / ARM
View on GitHub
ARM: An AutoRegressive Large Multimodal Model with Discrete Representations
☆50Jun 10, 2026Updated last month
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
Maplebb / UniREditBench
View on GitHub
[ECCV 2026] Offline implementation of UniREditBench: A Unified Reasoning-based Image Editing Benchmark.
☆58Jun 21, 2026Updated last month
HuiZhang0812 / WeEdit
View on GitHub
A Dataset, Benchmark and Glyph-Guided Framework for Text-centric Image Editing
☆20Mar 13, 2026Updated 4 months ago
apple / ml-gie-bench
View on GitHub
☆20Jul 24, 2025Updated 11 months ago
JPShi12 / VideoLoom
View on GitHub
[ICML 2026] VideoLoom: A Video Large Language Model for Joint Spatial-Temporal Understanding
☆27Jul 3, 2026Updated 2 weeks ago
HenryJunW / TAG
View on GitHub
☆22Dec 8, 2022Updated 3 years ago
YBYBZhang / Tool-R1
View on GitHub
Official pytorch implementation of "Tool-R1: Sample-Efficient Reinforcement Learning for Agentic Tool Use"
☆20Sep 16, 2025Updated 10 months ago
inst-it / inst-it
View on GitHub
[NeurIPS 2025] The official repository of "Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tun…
☆40Feb 20, 2025Updated last year
ozzafar / count_token_optimization
View on GitHub
☆16Sep 6, 2024Updated last year
apple / ml-ppg-age-analysis
View on GitHub
☆16Aug 20, 2025Updated 11 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
apple / ml-ui-jepa
View on GitHub
☆16Apr 25, 2025Updated last year
salesforce / woad-pytorch
View on GitHub
This is the pytorch implementation of WOAD: Weakly Supervised Online Action Detection in Untrimmed Videos (CVPR2021).
☆13May 1, 2025Updated last year
csmliu / pretrained-GANs
View on GitHub
A Survey on Leveraging Pre-trained Generative Adversarial Networks for Image Editing and Restoration
☆17Jul 22, 2022Updated 3 years ago
xumingze0308 / TRN.pytorch
View on GitHub
[ICCV 2019] Official implementation of Temporal Recurrent Networks for Online Action Detection
☆85Jul 21, 2022Updated 4 years ago
Tianhao-Qi / Mask2DiT
View on GitHub
CVPR 2025 Accepted Papers
☆26Dec 20, 2025Updated 7 months ago
FudanCVL / SceneDesigner
View on GitHub
[NeurIPS 2025 (Spotlight)] SceneDesigner: Controllable Multi-Object Image Generation with 9-DoF Pose Manipulation
☆30Dec 19, 2025Updated 7 months ago
FoundationVision / OmniTokenizer
View on GitHub
[NeurIPS 2024]OmniTokenizer: one model and one weight for image-video joint tokenization.
☆325Jul 9, 2024Updated 2 years ago
spatigen / milr
View on GitHub
Official code of paper: MILR: Improving Multimodal Image Generation via Test-Time Latent Reasoning
☆18Feb 12, 2026Updated 5 months ago
Hiahia1369 / StableShadowRemoval
View on GitHub
☆26Sep 8, 2025Updated 10 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
Tencent / HaploVLM
View on GitHub
ICML2025
☆63Aug 28, 2025Updated 10 months ago
caiyuanhao1998 / Open-PhyGDPO
View on GitHub
PhyGDPO: Physics-Aware Groupwise Direct Preference Optimization for Physically Consistent Text-to-Video Generation (ECCV 2026)
☆67Jun 20, 2026Updated last month
MengLcool / DeepStack-VL
View on GitHub
[NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…
☆93Jun 17, 2024Updated 2 years ago
sen-ye / R3
View on GitHub
[ICLR26] Understanding VS. Generation: Navigating Optimization Dilemma in Multimodal Models
☆25May 6, 2026Updated 2 months ago
salesforce / QVR-SimpleDLM
View on GitHub
Pytorch Implementation of Value Retrieval with Arbitrary Queries for Form-like Documents.
☆16May 1, 2025Updated last year
Osilly / Interleaving-Reasoning-Generation
View on GitHub
[ICLR 2026] This is an early exploration to introduce Interleaving Reasoning to Text-to-image Generation field and achieve the SoTA bench…
☆100Jan 26, 2026Updated 5 months ago
amazon-science / long-short-term-transformer
View on GitHub
[NeurIPS 2021 Spotlight] Official implementation of Long Short-Term Transformer for Online Action Detection
☆140Jul 25, 2024Updated last year
csuhan / Tar
View on GitHub
[NeurIPS 2025] Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations
☆202Sep 18, 2025Updated 10 months ago
wdrink / SimpleAR
View on GitHub
Pytorch implementation for the paper titled "SimpleAR: Pushing the Frontier of Autoregressive Visual Generation"
☆431Jun 20, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
KlingAIResearch / DiffMoE
View on GitHub
[Arxiv 2025] Official PyTorch implementation of DiffMoE, TC-DiT, EC-DiT and Dense DiT
☆175Oct 21, 2025Updated 9 months ago
apple / visatronic-demo
View on GitHub
Visatronic: A Multimodal Decoder-Only Model for Speech Synthesis
☆15May 28, 2025Updated last year
KlingAIResearch / SVG-T2I
View on GitHub
[Arxiv 2025] Official PyTorch Implementation of "SVG-T2I: Scaling up Text-to-Image Latent Diffusion Model Without Variational Autoencoder…
☆152Dec 18, 2025Updated 7 months ago
FudanCVL / SAAS
View on GitHub
[AAAI 2026] Segment Anything Across Shots: A Method and Benchmark
☆29Nov 16, 2025Updated 8 months ago
IBM / ColPret
View on GitHub
Efficient Scaling laws and collaborative pretraining.
☆22Updated this week
LoyoYang / DeCoTa
View on GitHub
ICCV 2021: Deep Co-Training with Task Decomposition for Semi-supervised Domain Adaptation
☆16Dec 8, 2022Updated 3 years ago
hithqd / ReasonBrain
View on GitHub
【ICML2026】Reasoning to Edit: Hypothetical Instruction-Based Image Editing with Visual Reasoning
☆27May 18, 2026Updated 2 months ago