apple / pytorch-speech-features
☆84Updated 11 months ago
Alternatives and similar repositories for pytorch-speech-features:
Users that are interested in pytorch-speech-features are comparing it to the libraries listed below
- Official repository for the "Powerset multi-class cross entropy loss for neural speaker diarization" paper published in Interspeech 2023.☆82Updated last year
- Transcribing Speech with Multinomial Diffusion, training code and models.☆76Updated last year
- Speaker change detection using SincNet and an LSTM/Transformer☆47Updated 8 months ago
- ☆59Updated last year
- Code for the paper: GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities☆113Updated 3 months ago
- ☆73Updated last week
- AudioBench: A Universal Benchmark for Audio Large Language Models☆150Updated this week
- This is a list of speech tasks and datasets, which can provide training data for Generative AI, AIGC, AI model training, intelligent spee…☆74Updated 9 months ago
- Contains the code associated with the ICLR submission for our text-to-speech diffusion model☆53Updated last year
- Audio tokenization, in the fastest way possible!☆49Updated 6 months ago
- ☆56Updated 2 years ago
- Implementation of BEST-RQ - a model for self-supervised learning of speech signals using a random projection quantizer, in Pytorch.☆111Updated last year
- ☆63Updated 6 months ago
- ☆36Updated 5 months ago
- Implementation of Google's USM speech model in Pytorch☆30Updated last month
- Official Code for ParrotTTS☆48Updated 5 months ago
- ☆19Updated 2 years ago
- [IJCAI'23] Learning to Speak from Text for Low-Resource TTS☆63Updated last year
- Word Discovery in Visually Grounded, Self-Supervised Speech Models☆26Updated last year
- An espeak-compatible, permissively-licensed IPA phonemizer (G2P) based on DeepPhonemizer. Usable as a drop-in replacement for espeak's GP…☆94Updated 5 months ago
- A toolkit to calculate speech audio quality. Not affiliated with the original authors☆51Updated 7 months ago
- ☆22Updated last month
- Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context☆191Updated 6 months ago
- Speaker identification/verification models for Machine Learning for Computer Vision class at UNIBO☆61Updated 2 years ago
- An unofficial PyTorch implementation of VALL-E☆88Updated this week
- Open implementation of UNIVERSE and UNIVERSE++ diffusion-based speech enhancement models.☆91Updated 6 months ago
- Codebase for the paper 'EncodecMAE: Leveraging neural codecs for universal audio representation learning'☆95Updated 7 months ago