SlamKit is an open source tool kit for efficient training of SpeechLMs. It was used for "Slamming: Training a Speech Language Model on One GPU in a Day"
β228May 18, 2025Updated 9 months ago
Alternatives and similar repositories for slamkit
Users that are interested in slamkit are comparing it to the libraries listed below
Sorting:
- The official code for the SALMonπ£ benchmark (ICASSP 2025 - Oral)β49Aug 15, 2025Updated 6 months ago
- Official repository for "Speaking Style Conversion With Discrete Self-Supervised Units" (EMNLP 2023). https://arxiv.org/abs/2212.09730β131Dec 8, 2023Updated 2 years ago
- Official repository for NAST: Noise Aware Speech Tokenization for Speech Language Models (Interspeech 2024) https://arxiv.org/abs/2406.11β¦β46Jul 2, 2024Updated last year
- Official implementation of "Dataset Size Recovery from LoRA Weights" paper.β34Jun 30, 2024Updated last year
- This repository contains prompts & best practices to annotate audio clips with a very high degree of details using Audio-Language-Modelsβ35Oct 13, 2024Updated last year
- The official repo of the paper "StressTest: Can YOUR Speech LM Handle the Stress?"β20Jul 9, 2025Updated 7 months ago
- Official PyTorch Implementation for the "Unsupervised Model Tree Heritage Recovery" paper (ICLR 2025).β63Jul 1, 2025Updated 8 months ago
- Codebase for 'Scaling Rich Style-Prompted Text-to-Speech Datasets'β155Mar 24, 2025Updated 11 months ago
- Official Code for SyllableLM: Learning Coarse Semantic Units for Speech Language Modelsβ61Jul 1, 2025Updated 8 months ago
- Interface Design for Self-Supervised Speech Models, Accepted to Interspeech2024β16Nov 19, 2024Updated last year
- [NeurIPS 2024] SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Wordsβ56Jun 25, 2024Updated last year
- β46Jul 7, 2025Updated 7 months ago
- Python scripts to create noisy and reverberant 2-speaker mixture audio with Libri-Light and WHAMβ17Nov 7, 2024Updated last year
- [AAAI 2025] Official Implementation for "Click2Mask: Local Editing with Dynamic Mask Generation" Paper.β20Jan 22, 2026Updated last month
- TEAL: New Selection Strategy for Small Buffers in Experience Replay Class Incremental Learningβ17Jan 21, 2025Updated last year
- Code for vec2wav 2.0, a speech token vocoder for VC. Paper: https://arxiv.org/abs/2409.01995β78Dec 3, 2024Updated last year
- VoiceBench: Benchmarking LLM-Based Voice Assistantsβ333Jan 29, 2026Updated last month
- Real-time Speech-Text Foundation Model Toolkit (wip)β253Mar 26, 2025Updated 11 months ago
- a Neural Vocoder supporting Ring Attention, Conformer and NSF.β24Aug 1, 2025Updated 7 months ago
- β19Jan 8, 2025Updated last year
- Official code for Interspeech 2023 paper "Self-supervised Fine-tuning for Improved Content Representations by Speaker-invariant Clusterinβ¦β64May 19, 2023Updated 2 years ago
- β387Oct 3, 2025Updated 5 months ago
- [EMNLP 2024] ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformersβ125Mar 20, 2025Updated 11 months ago
- β16Dec 18, 2023Updated 2 years ago
- Versatile Evaluation of Speech and Audioβ392Dec 9, 2025Updated 2 months ago
- β¨β¨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLMβ368May 27, 2025Updated 9 months ago
- Libriheavy: a 50,000 hours ASR corpus with punctuation casing and contextβ214Sep 10, 2024Updated last year
- A family of state-of-the-art Transformer-based audio codecs for low-bitrate high-quality audio coding.β419Feb 12, 2026Updated 2 weeks ago
- Models and code for RepCodec: A Speech Representation Codec for Speech Tokenizationβ192Jul 12, 2024Updated last year
- Speech Human Evaluation Estimation Toolkit (SHEET)β132Oct 2, 2025Updated 5 months ago
- Streamable Text-to-Speech model using a language modeling approach, without vector quantizationβ110May 20, 2025Updated 9 months ago
- Spatial Voice Conversion: Voice Conversion Preserving Spatial Information and Non-target Signalsβ18Aug 8, 2024Updated last year
- This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples aβ¦β647Jun 9, 2024Updated last year
- SSR-Speech: Towards Stable, Safe and Robust Zero-shot Speech Editing and Synthesisβ147Jan 1, 2025Updated last year
- Baichuan-Audio: A Unified Framework for End-to-End Speech Interactionβ218Feb 28, 2025Updated last year
- A Framework for Speech, Language, Audio, Music Processing with Large Language Modelβ974Jan 15, 2026Updated last month
- [ICASSP 2024] This is the official code for "VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching"β367Sep 3, 2024Updated last year
- β258Mar 15, 2024Updated last year
- β59Oct 22, 2025Updated 4 months ago