MIO: A Foundation Model on Multimodal Tokens
☆34Dec 13, 2024Updated last year
Alternatives and similar repositories for MIO
Users that are interested in MIO are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This repo contains evaluation code for the paper "AV-Odyssey: Can Your Multimodal LLMs Really Understand Audio-Visual Information?"☆31Dec 23, 2024Updated last year
- Implementation of the paper "Variable Bitrate Residual Vector Quantization for Audio Coding"☆11Apr 10, 2025Updated last year
- ☆12Mar 11, 2025Updated last year
- Official implementation of the paper - GD-Retriever: Controllable generative text-music retrieval with diffusion models (Accepted at ISMI…☆17Sep 25, 2025Updated 8 months ago
- Forced alignment decoder for Whisper.☆16Mar 13, 2024Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- a Neural Vocoder supporting Ring Attention, Conformer and NSF.☆25Aug 1, 2025Updated 9 months ago
- Official repository of Myna: Masking-Based Contrastive Learning of Musical Representations☆17Mar 31, 2025Updated last year
- ☆18Apr 19, 2024Updated 2 years ago
- Lyra V2 (SoundStream) running in the browser☆19Sep 20, 2023Updated 2 years ago
- ☆15Apr 13, 2025Updated last year
- A real time implementation of the ddsp from google magenta.☆16Nov 8, 2021Updated 4 years ago
- Project for MIDI to Audio Synthesis☆27Mar 13, 2023Updated 3 years ago
- The official implementation of the paper "Affective Faces for Goal-Driven Dyadic Communication."☆15Jan 27, 2023Updated 3 years ago
- My implementation of the model KosmosG from "KOSMOS-G: Generating Images in Context with Multimodal Large Language Models"☆14Nov 11, 2024Updated last year
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- [NeurIPS 2025] The official code for "IllumiCraft: Unified Geometry and Illumination Diffusion for Controllable Video Generation"☆22Jun 5, 2025Updated 11 months ago
- applying audio FX with text descriptors☆34Nov 12, 2025Updated 6 months ago
- A TTS Trained on Universal Audio.☆41Jun 6, 2025Updated 11 months ago
- A neural speech codec based on discrete WavLM representations☆26Aug 28, 2024Updated last year
- [ICLR'25] Official repository for "AVHBench: A Cross-Modal Hallucination Evaluation for Audio-Visual Large Language Models"☆24Mar 8, 2026Updated 2 months ago
- A collection of pre-trained audio models, in PyTorch.☆116Jan 27, 2023Updated 3 years ago
- A PyTorch implementation of NormSoftmax based on BMVC 2019 paper "Classification is a Strong Baseline for Deep Metric Learning"☆10Mar 15, 2020Updated 6 years ago
- ☆15Aug 4, 2024Updated last year
- ☆21Mar 3, 2026Updated 2 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆55Jul 16, 2025Updated 10 months ago
- Codebase for the paper 'EncodecMAE: Leveraging neural codecs for universal audio representation learning'☆101Jul 24, 2024Updated last year
- Audio-JEPA is an adaptation of the Joint-Embedding Predictive Architecture (JEPA) for self-supervised audio representation learning. Buil…☆55Apr 17, 2026Updated last month
- ☆25Nov 25, 2025Updated 6 months ago
- An efficient distillation method for flow matching models☆26Feb 1, 2026Updated 3 months ago
- Official Implementation of EnCLAP (ICASSP 2024)☆95Jun 2, 2024Updated last year
- Finetuning Stable Diffusion from Diffusers☆11Mar 11, 2024Updated 2 years ago
- My vocoder experiments☆31Jul 26, 2025Updated 10 months ago
- ☆41May 15, 2023Updated 3 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Physics-based Zero-Shot Video Generation☆31Oct 4, 2024Updated last year
- We introduce OpenStory++, a large-scale open-domain dataset focusing on enabling MLLMs to perform storytelling generation tasks.☆18Aug 30, 2024Updated last year
- The official repo of continuous speculative decoding☆34Mar 28, 2025Updated last year
- Open ChatGLM Eyes to See the World☆13Mar 30, 2023Updated 3 years ago
- Codebase and project page for EDMSound☆35Nov 20, 2023Updated 2 years ago
- [AAAI'26] Steering One-Step Diffusion Model with Fidelity-Rich Decoder for Fast Image Compression☆19Dec 21, 2025Updated 5 months ago
- ☆26Nov 17, 2025Updated 6 months ago