MIO-Team / MIO
MIO: A Foundation Model on Multimodal Tokens
☆21Updated 2 months ago
Alternatives and similar repositories for MIO:
Users that are interested in MIO are comparing it to the libraries listed below
- Official code implementation for the work Preference Alignment with Flow Matching (NeurIPS 2024)☆41Updated 3 months ago
- A project for tri-modal LLM benchmarking and instruction tuning.☆23Updated last week
- ☆17Updated last month
- The official implementation of MAGVLT: Masked Generative Vision-and-Language Transformer (CVPR'23)☆26Updated last year
- Pytorch Implementation of the Model from "MIRASOL3B: A MULTIMODAL AUTOREGRESSIVE MODEL FOR TIME-ALIGNED AND CONTEXTUAL MODALITIES"☆26Updated 3 weeks ago
- A big_vision inspired repo that implements a generic Auto-Encoder class capable in representation learning and generative modeling.☆34Updated 7 months ago
- Implementation of the paper: "Audio Mamba: Bidirectional State Space Model for Audio Representation Learning" in pytorch☆13Updated last week
- UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model☆21Updated 6 months ago
- Implementation of Retrieval-Augmented Denoising Diffusion Probabilistic Models in Pytorch☆64Updated 2 years ago
- An official pytorch implementation of AAAI 2024 paper "Latent Space Editing in Transformer-based Flow Matching"☆36Updated 10 months ago
- Implementation of the model: "Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models" in PyTorch☆29Updated last week
- [2024-ACL]: TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wildrounded Conversation☆47Updated last year
- The official code for paper "EasyGen: Easing Multimodal Generation with a Bidirectional Conditional Diffusion Model and LLMs"☆73Updated 3 months ago
- LLaVA combines with Magvit Image tokenizer, training MLLM without an Vision Encoder. Unifying image understanding and generation.☆35Updated 8 months ago
- ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer☆25Updated last month
- LMM which strictly superset LLM embedded☆38Updated 3 months ago
- ☆11Updated 7 months ago
- Implementation of SoundtStream from the paper: "SoundStream: An End-to-End Neural Audio Codec"☆12Updated 3 weeks ago
- This library supports evaluating disparities in generated image quality, diversity, and consistency between geographic regions.☆20Updated 8 months ago
- Language Quantized AutoEncoders☆99Updated 2 years ago
- The open source implementation of the cross attention mechanism from the paper: "JOINTLY TRAINING LARGE AUTOREGRESSIVE MULTIMODAL MODELS"☆26Updated 11 months ago
- [Interspeech 2024] LiteFocus is a tool designed to accelerate diffusion-based TTA model, now implemented with the base model AudioLDM2.☆33Updated 7 months ago
- trying to reproduce suno v3☆30Updated 3 weeks ago
- ☆47Updated last year
- Official Repository of IJCAI 2024 Paper: "BATON: Aligning Text-to-Audio Model with Human Preference Feedback"☆19Updated last month
- ☆86Updated last year