apple / pytorch-speech-features
☆84Updated 7 months ago
Related projects ⓘ
Alternatives and complementary repositories for pytorch-speech-features
- An espeak-compatible, permissively-licensed IPA phonemizer (G2P) based on DeepPhonemizer. Usable as a drop-in replacement for espeak's GP…☆83Updated last month
- Transcribing Speech with Multinomial Diffusion, training code and models.☆75Updated last year
- AudioBench: A Universal Benchmark for Audio Large Language Models☆93Updated last week
- ☆54Updated this week
- PyTorch implementation of Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities.☆192Updated last month
- SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.☆66Updated last week
- Implementation of BEST-RQ - a model for self-supervised learning of speech signals using a random projection quantizer, in Pytorch.☆95Updated last year
- Code for the paper: GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities☆80Updated 3 months ago
- Speaker change detection using SincNet and an LSTM/Transformer☆44Updated 4 months ago
- ☆81Updated 2 months ago
- Official repository for the "Powerset multi-class cross entropy loss for neural speaker diarization" paper published in Interspeech 2023.☆71Updated last year
- Official release of StyleTalk dataset.☆57Updated 4 months ago
- ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations☆126Updated 8 months ago
- Promting Whisper for Audio-Visual Speech Recognition, Code-Switched Speech Recognition, and Zero-Shot Speech Translation☆134Updated 10 months ago
- ☆19Updated last year
- ☆32Updated 2 months ago
- Contains the code associated with the ICLR submission for our text-to-speech diffusion model☆50Updated last year
- This is the official repository of the papers "Parameter-Efficient Transfer Learning of Audio Spectrogram Transformers" and "Efficient Fi…☆36Updated 3 months ago
- A sequence-to-sequence voice conversion toolkit.☆86Updated 4 months ago
- [Interspeech 2024] Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation☆80Updated this week
- The official Implementation of PeriodWave and PeriodWave-Turbo☆132Updated 3 months ago
- This is a list of speech tasks and datasets, which can provide training data for Generative AI, AIGC, AI model training, intelligent spee…☆72Updated 5 months ago
- Official Implementation of EnCLAP (ICASSP 2024)☆90Updated 5 months ago
- A TTS model that makes a speaker speak new languages☆75Updated 5 months ago
- ☆57Updated 2 months ago
- Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)☆139Updated last year
- INTERSPEECH 2023: "DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models"☆105Updated 9 months ago
- Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context☆183Updated 2 months ago
- [EMNLP 2024] ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers☆91Updated last month
- Official code for Wav2Seq☆95Updated 2 years ago