apple / pytorch-speech-features
☆84Updated 11 months ago
Alternatives and similar repositories for pytorch-speech-features:
Users that are interested in pytorch-speech-features are comparing it to the libraries listed below
- Transcribing Speech with Multinomial Diffusion, training code and models.☆76Updated last year
- Official repository for the "Powerset multi-class cross entropy loss for neural speaker diarization" paper published in Interspeech 2023.☆82Updated last year
- Implementation of BEST-RQ - a model for self-supervised learning of speech signals using a random projection quantizer, in Pytorch.☆112Updated last year
- Implementation of Google's USM speech model in Pytorch☆30Updated last month
- This is the official repository of the papers "Parameter-Efficient Transfer Learning of Audio Spectrogram Transformers" and "Efficient Fi…☆36Updated 7 months ago
- Audio tokenization, in the fastest way possible!☆49Updated 6 months ago
- This is a list of speech tasks and datasets, which can provide training data for Generative AI, AIGC, AI model training, intelligent spee…☆74Updated 9 months ago
- Audiogen Codec☆130Updated 8 months ago
- AudioBench: A Universal Benchmark for Audio Large Language Models☆165Updated this week
- Official Code for ParrotTTS☆48Updated 5 months ago
- A low-bitrate single-codebook 16 kHz speech codec based on focal modulation☆79Updated last month
- This is the official implementation of our multi-channel multi-speaker multi-spatial neural audio codec architecture.☆47Updated last week
- SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.☆83Updated 3 months ago
- ☆64Updated 6 months ago
- Official implementation of the paper "BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec"☆147Updated 6 months ago
- Code for the paper: GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities☆117Updated 3 months ago
- Speaker change detection using SincNet and an LSTM/Transformer☆48Updated 8 months ago
- ☆78Updated this week
- Official release of StyleTalk dataset.☆62Updated 8 months ago
- Contains the code associated with the ICLR submission for our text-to-speech diffusion model☆53Updated last year
- An neural full-band audio codec for general audio sampled at 48 kHz with 7.5 kps or 4.5 kbps.☆120Updated this week
- Official Code for SyllableLM: Learning Coarse Semantic Units for Speech Language Models☆51Updated 3 weeks ago
- ☆36Updated 6 months ago
- ☆68Updated 6 months ago
- This is an evolving repo for the paper "Towards Controllable Speech Synthesis in the Era of Large Language Models: A Survey".☆127Updated 2 months ago
- Codebase for 'Scaling Rich Style-Prompted Text-to-Speech Datasets'☆109Updated this week
- VoiceLDM: Text-to-Speech with Environmental Context☆172Updated 7 months ago
- An espeak-compatible, permissively-licensed IPA phonemizer (G2P) based on DeepPhonemizer. Usable as a drop-in replacement for espeak's GP…☆94Updated 5 months ago
- [EMNLP 2024] ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers☆110Updated this week
- LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation☆86Updated 2 weeks ago