llm-jp/llama-mimi

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/llm-jp/llama-mimi)

llm-jp / llama-mimi

Llama-Mimi is a speech language model that uses a unified tokenizer (Mimi) and a single Transformer decoder (Llama) to jointly model sequences of interleaved semantic and acoustic tokens.

☆31

Alternatives and similar repositories for llama-mimi

Users that are interested in llama-mimi are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

speed1313 / jax-llm
View on GitHub
JAX implementation of Large Language Models. You can train GPT-2-like model with 青空文庫 (aozora bunko-clean dataset) or any other text dat…
☆13Aug 5, 2024Updated last year
SakanaAI / kame_finetune
View on GitHub
☆30Jul 16, 2026Updated last week
line / WaveTrainerFit
View on GitHub
Official implementation of "Wave-Trainer-Fit: Neural Vocoder with Trainable Prior and Fixed-Point Iteration towards High-Quality Speech G…
☆16Feb 6, 2026Updated 5 months ago
umbertocappellazzo / Omni-AVSR
View on GitHub
Official Pytorch implementation of "Omni-AVSR: Towards Unified Multimodal Speech Recognition with Large Language Models" [IEEE ICASSP 202…
☆38Mar 10, 2026Updated 4 months ago
BUTSpeechFIT / SOT-DiCoW
View on GitHub
Multi-talker ASR based on DiCoW with Serialized Output Training
☆21Sep 18, 2025Updated 10 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
MaikeZuefle / f-actor
View on GitHub
☆28Jul 17, 2026Updated last week
lucadellalib / dycast
View on GitHub
A variable-frame-rate 16 kHz speech codec based on FocalCodec
☆20Feb 11, 2026Updated 5 months ago
yukara-ikemiya / Open-Miipher-2
View on GitHub
PyTorch implementation of Miipher-2 [2025] which is a speech restoration model by Google DeepMind
☆70Sep 22, 2025Updated 10 months ago
yxduir / LLM-SRT
View on GitHub
☆28Mar 11, 2026Updated 4 months ago
Aratako / CALM-DACVAE
View on GitHub
An attempt to reproduce CALM (Continuous Audio Language Models) using DACVAE as the audio VAE.
☆18Feb 20, 2026Updated 5 months ago
Tencent / StableToken
View on GitHub
[ICLR 2026] StableToken: A state-of-the-art noise-robust semantic speech tokenizer featuring Voting-LFQ for resilient SpeechLLMs.
☆33Feb 27, 2026Updated 5 months ago
JusperLee / Gull-Codec-Training
View on GitHub
☆12Mar 11, 2025Updated last year
yu-haoyuan / fd-badcat
View on GitHub
fd-sds
☆20Apr 8, 2026Updated 3 months ago
herimor / voxtream
View on GitHub
VoXtream is a Full-Stream Zero-shot TTS model with Extremely Low Latency and Speaking rate Control
☆245May 30, 2026Updated last month
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
AmphionTeam / FlexiCodec
View on GitHub
[ICLR2026] FlexiCodec: A Dynamic Neural Audio Codec for Low Frame Rates
☆51Jul 1, 2026Updated 3 weeks ago
frothywater / kanade-tokenizer
View on GitHub
Kanade is a single-layer disentangled speech tokenizer that extracts compact tokens suitable for both generative and discriminative model…
☆109Jul 18, 2026Updated last week
pfnet-research / jfbench
View on GitHub
☆15Mar 12, 2026Updated 4 months ago
sarulab-speech / DuplexChat
View on GitHub
☆47Jul 5, 2026Updated 3 weeks ago
khfs / DuplexMamba
View on GitHub
☆18Mar 6, 2026Updated 4 months ago
lysanderism / TimeAudio
View on GitHub
The official repository TimeAudio, a comprehensive framework that incorporates fine-grained acoustic cues into LALMs with enhanced module…
☆30Nov 18, 2025Updated 8 months ago
slp-rl / PAST
View on GitHub
☆48Jul 7, 2025Updated last year
lucadellalib / discrete-wavlm-codec
View on GitHub
A neural speech codec based on discrete WavLM representations
☆26Aug 28, 2024Updated last year
b-sigpro / sed-hsmm
View on GitHub
Onset-and-Offset-Aware Sound Event Detection
☆21Feb 10, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
shuheikatoinfo / UtterTune
View on GitHub
LoRA-based phoneme/prosody control for LLM-based TTS with no G2P - Lightweight adapter for edit and control the target language's phoneme…
☆26Jul 8, 2026Updated 2 weeks ago
sarulab-speech / SpatialCLAP
View on GitHub
☆19Oct 9, 2025Updated 9 months ago
yuriak / SpeechDialogueFactory
View on GitHub
☆40Apr 3, 2025Updated last year
audiosae / audio-sae
View on GitHub
Demo for AudioSAE paper
☆15Apr 26, 2026Updated 3 months ago
Linyx1125 / MM-F2F
View on GitHub
[ACL 2025] Predicting Turn-Taking and Backchannel in Human-Machine Conversations Using Linguistic, Acoustic, and Visual Signals
☆34Aug 11, 2025Updated 11 months ago
roudimit / Omni-R1
View on GitHub
[ASRU 2025] Omni-R1: Do You Really Need Audio to Fine-Tune Your Audio LLM?
☆47Nov 21, 2025Updated 8 months ago
IiuZiKai / Evo_TSE
View on GitHub
☆17Apr 9, 2026Updated 3 months ago
nu-dialogue / moshi-finetune
View on GitHub
Fine-tuning Moshi/J-Moshi on your own spoken dialogue data
☆101Jan 5, 2026Updated 6 months ago
lavendery / UUG
View on GitHub
☆21Sep 14, 2025Updated 10 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Alittleegg / Eureka-Audio
View on GitHub
Eureka-Audio: A 1.7B lightweight audio–language model that matches 7B–30B models on ASR, audio understanding, and paralinguistic reasonin…
☆40Apr 11, 2026Updated 3 months ago
llm-jp / text2dataset
View on GitHub
Easily turn large English text datasets into Japanese text datasets using open LLMs.
☆30Jan 20, 2025Updated last year
haoweilou / ParaStyleTTS
View on GitHub
This is the official code for ACM CIKM 2025 Paper: ParaStyleTTS: Toward Efficient and Robust Paralinguistic Style Control for Expressive …
☆59Dec 21, 2025Updated 7 months ago
JHU-LCAP / FlexSED
View on GitHub
open-vocabulary sound event detection
☆53Dec 17, 2025Updated 7 months ago
ASLP-lab / FlashTTS
View on GitHub
Fast Streaming TTS with MTP Acceleration and X-pred Mean Flow Distillation
☆67Jun 16, 2026Updated last month
unilight / jatts
View on GitHub
JATTS: A modern, research-oriented Japanese Text-to-speech Open-sourced Toolkit
☆43Mar 13, 2026Updated 4 months ago
NKU-HLT / DIFFA
View on GitHub
[AAAI 2026 & ACL 2026] The official implementation of the DIFFA series for dLLM-based large audio language model
☆83Apr 7, 2026Updated 3 months ago