MIO-Team / MIO
MIO: A Foundation Model on Multimodal Tokens
☆22Updated 3 months ago
Alternatives and similar repositories for MIO:
Users that are interested in MIO are comparing it to the libraries listed below
- A project for tri-modal LLM benchmarking and instruction tuning.☆24Updated last month
- Official code implementation for the work Preference Alignment with Flow Matching (NeurIPS 2024)☆45Updated 4 months ago
- ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer☆29Updated 3 months ago
- ☆17Updated 2 months ago
- HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation☆51Updated last month
- UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model☆21Updated 7 months ago
- LMM solved catastrophic forgetting, AAAI2025☆40Updated 4 months ago
- Official Repository of IJCAI 2024 Paper: "BATON: Aligning Text-to-Audio Model with Human Preference Feedback"☆23Updated 3 weeks ago
- An official pytorch implementation of AAAI 2024 paper "Latent Space Editing in Transformer-based Flow Matching"☆36Updated 11 months ago
- A big_vision inspired repo that implements a generic Auto-Encoder class capable in representation learning and generative modeling.☆34Updated 9 months ago
- Repository for "TESS-2: A Large-Scale, Generalist Diffusion Language Model"☆33Updated last month
- LLaVA combines with Magvit Image tokenizer, training MLLM without an Vision Encoder. Unifying image understanding and generation.☆35Updated 9 months ago
- OpenOmni: Official implementation of Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Alignment and Rea…☆40Updated 2 weeks ago
- Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".☆44Updated 3 months ago
- The official PyTorch implementation for Improving Long-Text Alignment for Text-to-Image Diffusion Models (LongAlign)☆69Updated 5 months ago
- ☆23Updated 5 months ago
- PyTorch implementation of StableMask (ICML'24)☆12Updated 9 months ago
- [Interspeech 2024] LiteFocus is a tool designed to accelerate diffusion-based TTA model, now implemented with the base model AudioLDM2.☆33Updated 3 weeks ago
- Official PyTorch implementation of EMOVA in CVPR 2025 (https://arxiv.org/abs/2409.18042)☆23Updated 2 weeks ago
- The official implementation of OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows☆57Updated 2 weeks ago
- ☆63Updated 2 months ago
- [NeurIPS 2024] Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective☆66Updated 5 months ago
- Long-Term Rhythmic Video Soundtracker, ICML2023☆57Updated 8 months ago
- Personal website☆16Updated 3 months ago
- Pytorch Implementation of the Model from "MIRASOL3B: A MULTIMODAL AUTOREGRESSIVE MODEL FOR TIME-ALIGNED AND CONTEXTUAL MODALITIES"☆26Updated 2 months ago
- ☆28Updated last year
- Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆56Updated last year
- [CVPR2024] ModaVerse: Efficiently Transforming Modalities with LLMs☆29Updated 8 months ago
- Official PyTorch Implementation for Task Vectors are Cross-Modal☆22Updated 3 months ago
- Official Code Implementation for 'A Simple Early Exiting Framework for Accelerated Sampling in Diffusion Models'☆16Updated 8 months ago