apple / pytorch-speech-features
☆84Updated last year
Alternatives and similar repositories for pytorch-speech-features:
Users that are interested in pytorch-speech-features are comparing it to the libraries listed below
- Official Code for ParrotTTS☆50Updated 6 months ago
- Official repository for the "Powerset multi-class cross entropy loss for neural speaker diarization" paper published in Interspeech 2023.☆83Updated last year
- Audio tokenization, in the fastest way possible!☆51Updated 8 months ago
- Transcribing Speech with Multinomial Diffusion, training code and models.☆76Updated last year
- This is a fork of the original fairseq repository (version 0.12.2) with added classes for training mHuBERT-147.☆17Updated 5 months ago
- SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.☆86Updated 4 months ago
- ☆38Updated 7 months ago
- ☆59Updated last year
- VoiceLDM: Text-to-Speech with Environmental Context☆175Updated 8 months ago
- ☆65Updated 7 months ago
- ☆92Updated this week
- This is the official repository of the papers "Parameter-Efficient Transfer Learning of Audio Spectrogram Transformers" and "Efficient Fi…☆36Updated 9 months ago
- Implementation of Google's USM speech model in Pytorch☆31Updated last month
- [ICASSP 2025] "FLowHigh: Towards efficient and high-quality audio super-resolution with single-step flow matching"☆61Updated 3 months ago
- This is the official implementation of our multi-channel multi-speaker multi-spatial neural audio codec architecture.☆49Updated last month
- ☆31Updated last month
- The implementation for "Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions"☆39Updated 3 weeks ago
- Autovocoder: Fast Waveform Generation from a Learned Speech Representation using Differentiable Digital Signal Processing☆70Updated 2 years ago
- Open implementation of UNIVERSE and UNIVERSE++ diffusion-based speech enhancement models.☆94Updated 8 months ago
- Codebase for the paper 'EncodecMAE: Leveraging neural codecs for universal audio representation learning'☆96Updated 9 months ago
- A trainer for SNAC (Multi-Scale Neural Audio Codec) has replaced the decoder with Vocos.☆51Updated 6 months ago
- Official Implementation of EnCLAP (ICASSP 2024)☆91Updated 11 months ago
- A TTS model that makes a speaker speak new languages☆76Updated 10 months ago
- [EMNLP 2024] ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers☆118Updated last month
- Contains the code associated with the ICLR submission for our text-to-speech diffusion model☆53Updated last year
- Audio-visual diarization pipeline used for creating VoxConverse dataset☆21Updated 2 months ago
- Official implementation for FlowSep☆45Updated 4 months ago
- X-E-Speech: Joint Training Framework of Non-Autoregressive Cross-lingual Emotional Text-to-Speech and Voice Conversion☆89Updated last year
- Code for the paper: GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities☆124Updated 5 months ago
- LibriSpeech-Long is a benchmark dataset for long-form speech generation and processing. Released as part of "Long-Form Speech Generation …☆64Updated 4 months ago