MIO-Team / MIOLinks

MIO: A Foundation Model on Multimodal Tokens

☆30

Alternatives and similar repositories for MIO

Users that are interested in MIO are comparing it to the libraries listed below

Sorting:

multimodal-art-projection / OmniBench
A project for tri-modal LLM benchmarking and instruction tuning.
☆50Updated 7 months ago
jadehaus / preference-flow-matching
Official code implementation for the work Preference Alignment with Flow Matching (NeurIPS 2024)
☆62Updated last year
hamishivi / tess-2
Repository for "TESS-2: A Large-Scale, Generalist Diffusion Language Model"
☆51Updated 9 months ago
lzw-lzw / UnifiedMLLM
UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model
☆22Updated last year
HelmholtzAI-FZJ / flex_gen
☆19Updated 10 months ago
OpenNLPLab / TAVGBench
Demo page of TAVGBench: Benchmarking Text to Audible-Video Generation
☆14Updated 7 months ago
philippe-eecs / small-vision
A big_vision inspired repo that implements a generic Auto-Encoder class capable in representation learning and generative modeling.
☆34Updated last year
kakaobrain / magvlt
The official implementation of MAGVLT: Masked Generative Vision-and-Language Transformer (CVPR'23)
☆27Updated last year
shulin16 / MMInA
[ACL2025 Findings] Benchmarking Multihop Multimodal Internet Agents
☆47Updated 8 months ago
kyegomez / Mirasol
Pytorch Implementation of the Model from "MIRASOL3B: A MULTIMODAL AUTOREGRESSIVE MODEL FOR TIME-ALIGNED AND CONTEXTUAL MODALITIES"
☆25Updated 9 months ago
HaozheZhao / MENTOR
☆30Updated 4 months ago
BriansIDP / AudioVisualLLM
☆19Updated last year
thunlp / ACDiT
ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer
☆38Updated 10 months ago
mlvlab / CAF
Official Implementation (Pytorch) of "Constant Acceleration Flow", NeurIPS 2024
☆33Updated 3 weeks ago
zh460045050 / VQGAN-LC
☆139Updated last year
MikaStars39 / StableMask
PyTorch implementation of StableMask (ICML'24)
☆14Updated last year
luping-liu / LongAlign
The official PyTorch implementation for Improving Long-Text Alignment for Text-to-Image Diffusion Models (LongAlign)
☆80Updated 6 months ago
XiaomiMiMo / MiMo-Audio-Eval
☆69Updated last month
ali-vilab / alitok
AliTok: Towards Sequence Modeling Alignment between Tokenizer and Autoregressive Model
☆49Updated last month
Gen-Verse / HermesFlow
[NeurIPS 2025] HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation
☆71Updated 2 months ago
Yuanshi9815 / LiteFocus
[Interspeech 2024] LiteFocus is a tool designed to accelerate diffusion-based TTA model, now implemented with the base model AudioLDM2.
☆34Updated 8 months ago
lucasjinreal / LLaVA-Magvit2
LLaVA combines with Magvit Image tokenizer, training MLLM without an Vision Encoder. Unifying image understanding and generation.
☆37Updated last year
FreedomIntelligence / FusionAudio
Towards Fine-grained Audio Captioning with Multimodal Contextual Cues
☆83Updated last month
danny911kr / REALTALK
Evaluate your agent memory on real-world dialogues, not LLM-simulated dialogues.
☆35Updated 4 months ago
j-min / VPGen
Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)
☆56Updated 2 years ago
DAMO-NLP-SG / DiGIT
[NeurIPS 2024] Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective
☆72Updated last year
adobe-research / ImageFolder
☆19Updated 11 months ago
haoliuhl / language-quantized-autoencoders
Language Quantized AutoEncoders
☆111Updated 2 years ago
Neur-IO / ReVQ
Explore how to get a VQ-VAE models efficiently!
☆62Updated 3 months ago
Victorwz / MLM_Filter
Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".
☆68Updated 7 months ago