apple / pytorch-speech-features
☆84Updated 10 months ago
Alternatives and similar repositories for pytorch-speech-features:
Users that are interested in pytorch-speech-features are comparing it to the libraries listed below
- Transcribing Speech with Multinomial Diffusion, training code and models.☆76Updated last year
- Code for the paper: GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities☆109Updated 2 months ago
- Speaker change detection using SincNet and an LSTM/Transformer☆46Updated 7 months ago
- Contains the code associated with the ICLR submission for our text-to-speech diffusion model☆51Updated last year
- An espeak-compatible, permissively-licensed IPA phonemizer (G2P) based on DeepPhonemizer. Usable as a drop-in replacement for espeak's GP…☆93Updated 4 months ago
- ☆70Updated 2 months ago
- ☆36Updated 5 months ago
- This is a list of speech tasks and datasets, which can provide training data for Generative AI, AIGC, AI model training, intelligent spee…☆75Updated 8 months ago
- Official Code for ParrotTTS☆49Updated 4 months ago
- Official release of StyleTalk dataset.☆61Updated 7 months ago
- Official implementation of the paper "BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec"☆132Updated 5 months ago
- Official repository for the "Powerset multi-class cross entropy loss for neural speaker diarization" paper published in Interspeech 2023.☆80Updated last year
- AudioBench: A Universal Benchmark for Audio Large Language Models☆124Updated this week
- ☆19Updated last year
- A TTS model that makes a speaker speak new languages☆76Updated 8 months ago
- Implementation of Google's USM speech model in Pytorch☆28Updated 3 weeks ago
- ☆21Updated 2 weeks ago
- Audio tokenization, in the fastest way possible!☆48Updated 5 months ago
- This is the official repository of the papers "Parameter-Efficient Transfer Learning of Audio Spectrogram Transformers" and "Efficient Fi…☆36Updated 6 months ago
- ☆63Updated 5 months ago
- SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.☆76Updated last month
- [EMNLP 2024] ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers☆100Updated 2 months ago
- Codebase for the paper 'EncodecMAE: Leveraging neural codecs for universal audio representation learning'☆93Updated 6 months ago
- Implementation of BEST-RQ - a model for self-supervised learning of speech signals using a random projection quantizer, in Pytorch.☆111Updated last year
- This is a fork of the original fairseq repository (version 0.12.2) with added classes for training mHuBERT-147.☆15Updated 3 months ago
- PyTorch implementation of Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities.☆226Updated 4 months ago
- Official Implementation of EnCLAP (ICASSP 2024)☆90Updated 8 months ago
- Audiogen Codec☆131Updated 7 months ago
- GPT-style network for phonemization with durations of text☆63Updated 11 months ago