microsoft / Pengi
An Audio Language model for Audio Tasks
☆281Updated 5 months ago
Related projects: ⓘ
- Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".☆360Updated 4 months ago
- This reporsitory contains metadata of WavCaps dataset and codes for downstream tasks.☆194Updated last month
- This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples a…☆412Updated 3 months ago
- Keep track of big models in audio domain, including speech, singing, music etc.☆431Updated 8 months ago
- PyTorch implementation of Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities.☆169Updated 3 weeks ago
- The Open Source Code of UniAudio☆509Updated last month
- Learning audio concepts from natural language supervision☆458Updated 3 months ago
- Audio Captioning datasets for PyTorch.☆98Updated 2 weeks ago
- Paper, Code and Statistics for Self-Supervised Learning and Pre-Training on Speech.☆196Updated 8 months ago
- 🔊 Repository for our NAACL-HLT 2019 paper: AudioCaps☆134Updated 4 months ago
- An Open-source Streaming High-fidelity Neural Audio Codec☆402Updated 3 months ago
- AudioLDM training, finetuning, evaluation and inference.☆190Updated 3 months ago
- MU-LLaMA: Music Understanding Large Language Model☆219Updated 5 months ago
- Implementation of Spear-TTS - multi-speaker text-to-speech attention network, in Pytorch☆250Updated 10 months ago
- FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music gener…☆344Updated 7 months ago
- [ICASSP 2024] This is the official code for "VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching"☆295Updated 2 weeks ago
- [IJCAI 2024] EAT: Self-Supervised Pre-Training with Efficient Audio Transformer☆99Updated 5 months ago
- Audio Codec Speech processing Universal PERformance Benchmark☆201Updated last week
- Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)☆211Updated this week
- An official reimplementation of the method described in the INTERSPEECH 2021 paper - Speech Resynthesis from Discrete Disentangled Self-S…☆375Updated last year
- [INTERSPEECH 2024] EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark☆123Updated 3 months ago
- [Interspeech 2024] Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation☆64Updated 3 weeks ago
- This repo hosts the code and models of "Masked Autoencoders that Listen".☆519Updated 5 months ago
- VoiceLDM: Text-to-Speech with Environmental Context☆157Updated last month
- Official implementation of the paper "Acoustic Music Understanding Model with Large-Scale Self-supervised Training".☆293Updated 4 months ago
- A curated list of awesome voice conversion, projects and communities.☆169Updated 2 weeks ago
- Collection of resources on the applications of Large Language Models (LLMs) in Audio AI.☆541Updated last month
- Dataset and baseline code for the VocalSound dataset (ICASSP2022).☆106Updated last year
- UniSpeech - Large Scale Self-Supervised Learning for Speech☆419Updated 5 months ago
- CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model☆177Updated 4 months ago