microsoft / Pengi
An Audio Language model for Audio Tasks
☆290Updated 7 months ago
Related projects ⓘ
Alternatives and complementary repositories for Pengi
- Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".☆386Updated 6 months ago
- This reporsitory contains metadata of WavCaps dataset and codes for downstream tasks.☆205Updated 3 months ago
- PyTorch implementation of Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities.☆192Updated last month
- This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples a…☆477Updated 5 months ago
- Keep track of big models in audio domain, including speech, singing, music etc.☆457Updated last month
- The Open Source Code of UniAudio☆522Updated 3 months ago
- Audio Captioning datasets for PyTorch.☆107Updated 2 weeks ago
- An official reimplementation of the method described in the INTERSPEECH 2021 paper - Speech Resynthesis from Discrete Disentangled Self-S…☆390Updated last year
- 🔊 Repository for our NAACL-HLT 2019 paper: AudioCaps☆144Updated 6 months ago
- Paper, Code and Statistics for Self-Supervised Learning and Pre-Training on Speech.☆201Updated 10 months ago
- MU-LLaMA: Music Understanding Large Language Model☆236Updated 7 months ago
- An Open-source Streaming High-fidelity Neural Audio Codec☆444Updated 3 weeks ago
- VoiceLDM: Text-to-Speech with Environmental Context☆163Updated 3 months ago
- Audio Large Language Models☆136Updated this week
- AudioLDM training, finetuning, evaluation and inference.☆210Updated 5 months ago
- [ICASSP 2024] This is the official code for "VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching"☆310Updated 2 months ago
- [INTERSPEECH 2024] EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark☆146Updated 5 months ago
- [IJCAI 2024] EAT: Self-Supervised Pre-Training with Efficient Audio Transformer☆114Updated 7 months ago
- FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music gener…☆369Updated 9 months ago
- Audio Codec Speech processing Universal PERformance Benchmark☆220Updated 2 weeks ago
- Official implementation of the paper "Acoustic Music Understanding Model with Large-Scale Self-supervised Training".☆310Updated 7 months ago
- EMNLP 23 - Integrating Whisper Encoder to LLaMA Decoder for Generative ASR Error Correction☆232Updated 6 months ago
- Learning audio concepts from natural language supervision☆487Updated 2 months ago
- Pytorch implementation of "Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech"☆182Updated last year
- A curated list of awesome voice conversion, projects and communities.☆200Updated this week
- Dataset and baseline code for the VocalSound dataset (ICASSP2022).☆123Updated 2 years ago
- Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)☆139Updated last year
- CVSS: A Massively Multilingual Speech-to-Speech Translation Corpus☆183Updated 2 years ago
- Implementation of Spear-TTS - multi-speaker text-to-speech attention network, in Pytorch☆257Updated last year
- This repo hosts the code and model of "Separate What You Describe: Language-Queried Audio Source Separation", Interspeech 2022☆140Updated last year