slp-rl / slamkit
SlamKit is an open source tool kit for efficient training of SpeechLMs. It was used for "Slamming: Training a Speech Language Model on One GPU in a Day"
☆186Updated this week
Alternatives and similar repositories for slamkit:
Users that are interested in slamkit are comparing it to the libraries listed below
- [Interspeech 2024] Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation☆145Updated last month
- Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis☆235Updated 2 weeks ago
- Real-time Speech-Text Foundation Model Toolkit (wip)☆203Updated this week
- LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM☆210Updated last week
- VoiceBench: Benchmarking LLM-Based Voice Assistants☆151Updated 2 weeks ago
- This repository contains the code and data for the paper EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control by Haozhe Chen,…☆68Updated 5 months ago
- Official repository of the paper "MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization".☆164Updated 2 months ago
- The official Implementation of PeriodWave and PeriodWave-Turbo☆179Updated last month
- Official implementation of the TTS model Lina-Speech☆157Updated 2 months ago
- Collection of Open Source Speech Data☆152Updated 4 months ago
- Code for the paper: GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities☆118Updated 3 months ago
- An unofficial PyTorch implementation of VALL-E☆87Updated this week
- Official implementation of the paper "BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec"☆147Updated 6 months ago
- Unified automatic quality assessment for speech, music, and sound.☆427Updated 2 weeks ago
- PyTorch implementation of Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities.☆400Updated this week
- Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.☆60Updated 2 weeks ago
- Open TTS models, built for streaming on the edge☆38Updated last week
- A family of state-of-the-art Transformer-based audio codecs for low-bitrate high-quality audio coding.☆345Updated this week
- ☆352Updated 6 months ago
- ☆67Updated last week
- Audiogen Codec☆130Updated 8 months ago
- ☆254Updated last year
- ☆84Updated 11 months ago
- LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation☆86Updated 3 weeks ago
- VoiceRestore: Flow-Matching Transformers for Universal Speech Restoration☆156Updated this week
- VALL-E 2 reproduction☆122Updated 8 months ago
- small audio language model for reasoning☆49Updated last week
- This is an evolving repo for the paper "Towards Controllable Speech Synthesis in the Era of Large Language Models: A Survey".☆127Updated 2 months ago
- An neural full-band audio codec for general audio sampled at 48 kHz with 7.5 kps or 4.5 kbps.☆120Updated this week
- DEX-TTS: Diffusion-based EXpressive TTS with Style Modeling on Time Variability☆100Updated 2 months ago