slp-rl / slamkitLinks
SlamKit is an open source tool kit for efficient training of SpeechLMs. It was used for "Slamming: Training a Speech Language Model on One GPU in a Day"
☆228Updated 8 months ago
Alternatives and similar repositories for slamkit
Users that are interested in slamkit are comparing it to the libraries listed below
Sorting:
- ☆346Updated 3 months ago
- This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaD…☆196Updated 3 months ago
- LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM☆293Updated 8 months ago
- Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis☆343Updated 5 months ago
- [NeurIPS' 25] Benchmark for evaluating TTS models on complex prosodic, expressiveness, and linguistic challenges.☆186Updated last month
- Collection of Open Source Speech Data☆164Updated 3 months ago
- Official implementation of the TTS model Lina-Speech☆175Updated last year
- LongCat Audio Tokenizer and Detokenizer☆272Updated last week
- DACVAE☆187Updated 3 weeks ago
- 🎙️ Automatically transcribe audio/video into high-quality, speaker-specific Text-To-Speech datasets ✨☆132Updated 5 months ago
- [EMNLP Main '25] LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation☆146Updated 8 months ago
- Real-time Speech-Text Foundation Model Toolkit (wip)☆250Updated 9 months ago
- A package for NeuCodec: a 50hz, 0.8kbps, 24kHz audio codec.☆143Updated 3 months ago
- ☆385Updated last year
- ☆254Updated 8 months ago
- Whisper-Flamingo [Interspeech 2024] and mWhisper-Flamingo [IEEE SPL 2025] for Audio-Visual Speech Recognition and Translation☆196Updated 5 months ago
- Official code for "F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization"☆141Updated 7 months ago
- An neural full-band audio codec for general audio sampled at 48 kHz with 7.5 kps or 4.5 kbps.☆196Updated 6 months ago
- VoiceBench: Benchmarking LLM-Based Voice Assistants☆319Updated last month
- ☆105Updated 3 months ago
- Code for the blog "Neural audio codecs: how to get audio into LLMs"☆145Updated 2 months ago
- ☆42Updated 4 months ago
- small audio language model for reasoning☆84Updated last month
- VoXtream is a Full-Stream Zero-shot TTS model with Extremely Low Latency☆181Updated 2 months ago
- LSLM implements full duplex modeling in interactive speech language models, based on research by Ma et al. (2024). This project advances …☆85Updated 6 months ago
- This project is to train an RWKV LLM for TTS generation which compatible to other TTS engine(like fish/cosy/chattts).☆92Updated 3 months ago
- Text-audio foundation model from Boson AI☆116Updated 4 months ago
- This repository contains the code and data for the paper EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control by Haozhe Chen,…☆81Updated last year
- [TAFFC 2025] The official implementation of EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vec…☆116Updated 4 months ago
- Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.☆69Updated 2 months ago