apple / dmel-demoLinks
dMel: Speech Tokenization Made Simple
☆13Updated last month
Alternatives and similar repositories for dmel-demo
Users that are interested in dmel-demo are comparing it to the libraries listed below
Sorting:
- Training hybrid models for dummies.☆23Updated 5 months ago
- ☆38Updated last month
- ☆18Updated 2 months ago
- Rust crate for some audio utilities☆24Updated 3 months ago
- GoldFinch and other hybrid transformer components☆10Updated last month
- Declare your datasets and download them using a simple tool☆10Updated 10 months ago
- ☆21Updated 3 months ago
- Audio Entailment: Deductive Reasoning for Audio Understanding☆13Updated 6 months ago
- A small rust-based data loader☆29Updated 2 weeks ago
- An open source replication of the stawberry method that leverages Monte Carlo Search with PPO and or DPO☆29Updated this week
- Acoustic Neighbor Embeddings☆24Updated 6 months ago
- ☆10Updated last year
- Official repo of dataset-decomposition paper [NeurIPS 2024]☆19Updated 5 months ago
- implementation of https://arxiv.org/pdf/2312.09299☆21Updated 11 months ago
- Run ONNX RWKV-v4 models with GPU acceleration using DirectML [Windows], or just on CPU [Windows AND Linux]; Limited to 430M model at this…☆21Updated 2 years ago
- Implementation of E2-TTS, "Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS", in MLX☆20Updated 8 months ago
- Basic Denoising Diffusion Probabilistic Model image generator implemented in PyTorch☆10Updated 5 months ago
- Proof of concept for running moshi/hibiki using webrtc☆19Updated 4 months ago
- ☆13Updated 9 months ago
- [Early Alpha] A unified framework for text-to-speech, voice conversion, automatic speech recognition, audio classification, voice activit…☆21Updated 5 months ago
- Trying to build an all in one speech-text language model - a bit like GPT-4o☆22Updated last year
- Implementation of 'Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis', in MLX☆20Updated 7 months ago
- Generate audio datasets for training Text-To-Speech models, through smart audio splitting with silence detection, and transcription using…☆28Updated 2 years ago
- Experimental playground for benchmarking language model (LM) architectures, layers, and tricks on smaller datasets. Designed for flexible…☆50Updated this week
- Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jax☆14Updated last year
- This library supports evaluating disparities in generated image quality, diversity, and consistency between geographic regions.☆20Updated last year
- JAX Implementations of Descript Audio Codec and EnCodec☆29Updated 2 months ago
- https://x.com/BlinkDL_AI/status/1884768989743882276☆28Updated last month
- Code associated with the paper: CTC-DRO: Robust Optimization for Reducing Language Disparities in Speech Recognition.☆15Updated last month
- IPA Phonemizer/Dephonemizer for 139 human languages☆27Updated 2 months ago