MIO: A Foundation Model on Multimodal Tokens
☆34Dec 13, 2024Updated last year
Alternatives and similar repositories for MIO
Users that are interested in MIO are comparing it to the libraries listed below
Sorting:
- This repo contains evaluation code for the paper "AV-Odyssey: Can Your Multimodal LLMs Really Understand Audio-Visual Information?"☆31Dec 23, 2024Updated last year
- Implementation of the paper "Variable Bitrate Residual Vector Quantization for Audio Coding"☆11Apr 10, 2025Updated 10 months ago
- Official repository of Myna: Masking-Based Contrastive Learning of Musical Representations☆17Mar 31, 2025Updated 11 months ago
- Lyra V2 WebAssembly build☆29Sep 23, 2024Updated last year
- WIP: Ofen is a toolkit aimed at making transformer models production-ready. API included☆17Oct 2, 2024Updated last year
- Forced alignment decoder for Whisper.☆14Mar 13, 2024Updated last year
- ☆18Apr 19, 2024Updated last year
- ☆15Apr 13, 2025Updated 10 months ago
- Lyra V2 (SoundStream) running in the browser☆19Sep 20, 2023Updated 2 years ago
- A real time implementation of the ddsp from google magenta.☆15Nov 8, 2021Updated 4 years ago
- a Neural Vocoder supporting Ring Attention, Conformer and NSF.☆24Aug 1, 2025Updated 7 months ago
- ☆11Sep 12, 2025Updated 5 months ago
- Code repo for "Model-Generated Pretraining Signals Improves Zero-Shot Generalization of Text-to-Text Transformers" (ACL 2023)☆22Nov 1, 2023Updated 2 years ago
- Speech Resynthesis and Language Modeling☆27Jun 11, 2025Updated 8 months ago
- Audio-JEPA is an adaptation of the Joint-Embedding Predictive Architecture (JEPA) for self-supervised audio representation learning. Buil…☆40Jun 17, 2025Updated 8 months ago
- applying audio FX with text descriptors☆33Nov 12, 2025Updated 3 months ago
- Inference codebase for "Cacophony: An Improved Contrastive Audio-Text Model". Preprint: https://arxiv.org/abs/2402.06986☆48Jan 19, 2026Updated last month
- Repository containing the SPIN experiments on the DIBT 10k ranked prompts☆23Mar 12, 2024Updated last year
- Official Implementation of EnCLAP (ICASSP 2024)☆94Jun 2, 2024Updated last year
- Codebase for the paper 'EncodecMAE: Leveraging neural codecs for universal audio representation learning'☆101Jul 24, 2024Updated last year
- ☆54Jul 16, 2025Updated 7 months ago
- A trainer for SNAC (Multi-Scale Neural Audio Codec) has replaced the decoder with Vocos.☆66Oct 28, 2024Updated last year
- A neural speech codec based on discrete WavLM representations☆24Aug 28, 2024Updated last year
- An evolutionary algorithm-based optimization for tracking weights in the OpenSim Residual Reduction Algorithm (RRA).☆11Jul 17, 2023Updated 2 years ago
- Project for MIDI to Audio Synthesis☆27Mar 13, 2023Updated 2 years ago
- A collection of pre-trained audio models, in PyTorch.☆116Jan 27, 2023Updated 3 years ago
- ☆37Dec 16, 2020Updated 5 years ago
- Principal Component Anlaysis (PCA) in PyTorch.☆39Jul 10, 2025Updated 7 months ago
- Syllable Segmentation and Cross-Lingual Generalization in a Visually Grounded, Self-Supervised Speech Model☆34Aug 27, 2023Updated 2 years ago
- PAM is a no-reference audio quality metric for audio generation tasks☆76Jul 19, 2024Updated last year
- Codebase and project page for EDMSound☆35Nov 20, 2023Updated 2 years ago
- ☆36Feb 26, 2024Updated 2 years ago
- DiTTo-TTS: Diffusion Transformers for Scalable Text-to-Speech without Domain-Specific Factors☆36Feb 11, 2025Updated last year
- Codebase for fine-tuning Llama2 70B to generate math test questions and answers.☆11Aug 30, 2024Updated last year
- The official repo of continuous speculative decoding☆31Mar 28, 2025Updated 11 months ago
- My vocoder experiments☆31Jul 26, 2025Updated 7 months ago
- [ACL 2025 Main] UniCodec: a unified audio codec with a single codebook to support multi-domain audio data, including speech, music, and s…☆154May 30, 2025Updated 9 months ago
- Supercharge huggingface transformers with model parallelism.☆78Jul 23, 2025Updated 7 months ago
- ☆34Jun 15, 2021Updated 4 years ago