MIO: A Foundation Model on Multimodal Tokens
☆34Dec 13, 2024Updated last year
Alternatives and similar repositories for MIO
Users that are interested in MIO are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This repo contains evaluation code for the paper "AV-Odyssey: Can Your Multimodal LLMs Really Understand Audio-Visual Information?"☆31Dec 23, 2024Updated last year
- ☆12Mar 11, 2025Updated last year
- Official implementation of the paper - GD-Retriever: Controllable generative text-music retrieval with diffusion models (Accepted at ISMI…☆17Sep 25, 2025Updated 6 months ago
- Forced alignment decoder for Whisper.☆15Mar 13, 2024Updated 2 years ago
- a Neural Vocoder supporting Ring Attention, Conformer and NSF.☆25Aug 1, 2025Updated 8 months ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Official repository of Myna: Masking-Based Contrastive Learning of Musical Representations☆17Mar 31, 2025Updated last year
- This is the official code used for WAT 2017 Description Paper titled A Bag of Useful Tricks for Practical Neural Machine Translation: Emb…☆12Oct 24, 2017Updated 8 years ago
- ☆11Nov 9, 2022Updated 3 years ago
- ☆18Apr 19, 2024Updated last year
- Lyra V2 (SoundStream) running in the browser☆19Sep 20, 2023Updated 2 years ago
- ☆15Apr 13, 2025Updated last year
- A real time implementation of the ddsp from google magenta.☆16Nov 8, 2021Updated 4 years ago
- Project for MIDI to Audio Synthesis☆27Mar 13, 2023Updated 3 years ago
- [ICLR'25] Official repository for "AVHBench: A Cross-Modal Hallucination Evaluation for Audio-Visual Large Language Models"☆21Mar 8, 2026Updated last month
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- applying audio FX with text descriptors☆34Nov 12, 2025Updated 5 months ago
- The official implementation of the paper "Affective Faces for Goal-Driven Dyadic Communication."☆15Jan 27, 2023Updated 3 years ago
- This is the codebase for a unified generative methods (CNN-based, GAN-based, and diffusion-based) for 3D medical cross modality synthesis…☆14Dec 13, 2024Updated last year
- My implementation of the model KosmosG from "KOSMOS-G: Generating Images in Context with Multimodal Large Language Models"☆14Nov 11, 2024Updated last year
- The multi-modal sequence to sequence baseline neural models used in the Grounded SCAN paper.☆16Mar 21, 2021Updated 5 years ago
- Inference codebase for "Cacophony: An Improved Contrastive Audio-Text Model". Preprint: https://arxiv.org/abs/2402.06986☆49Jan 19, 2026Updated 2 months ago
- [NeurIPS 2025] The official code for "IllumiCraft: Unified Geometry and Illumination Diffusion for Controllable Video Generation"☆22Jun 5, 2025Updated 10 months ago
- A neural speech codec based on discrete WavLM representations☆26Aug 28, 2024Updated last year
- A collection of pre-trained audio models, in PyTorch.☆116Jan 27, 2023Updated 3 years ago
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- [ICCV25] USP: Unified Self-Supervised Pretraining for Image Generation and Understanding☆92Oct 11, 2025Updated 6 months ago
- A PyTorch implementation of NormSoftmax based on BMVC 2019 paper "Classification is a Strong Baseline for Deep Metric Learning"☆10Mar 15, 2020Updated 6 years ago
- ☆15Aug 4, 2024Updated last year
- Code for "AVG-LLaVA: A Multimodal Large Model with Adaptive Visual Granularity"☆33Oct 12, 2024Updated last year
- ☆54Jul 16, 2025Updated 9 months ago
- Codebase for the paper 'EncodecMAE: Leveraging neural codecs for universal audio representation learning'☆100Jul 24, 2024Updated last year
- The assignments from Stanford Natural Language Processing Course on Coursera, started Jan 2016, Finished Jan 2016☆13Jan 28, 2016Updated 10 years ago
- An efficient distillation method for flow matching models☆25Feb 1, 2026Updated 2 months ago
- ☆22Nov 25, 2025Updated 4 months ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Audio-JEPA is an adaptation of the Joint-Embedding Predictive Architecture (JEPA) for self-supervised audio representation learning. Buil…☆49Mar 19, 2026Updated 3 weeks ago
- Official Implementation of EnCLAP (ICASSP 2024)☆94Jun 2, 2024Updated last year
- Finetuning Stable Diffusion from Diffusers☆11Mar 11, 2024Updated 2 years ago
- My vocoder experiments☆31Jul 26, 2025Updated 8 months ago
- Physics-based Zero-Shot Video Generation☆31Oct 4, 2024Updated last year
- https://wavelandspeech.github.io/☆10Jan 12, 2024Updated 2 years ago
- We introduce OpenStory++, a large-scale open-domain dataset focusing on enabling MLLMs to perform storytelling generation tasks.☆17Aug 30, 2024Updated last year