microsoft / WavText5K
Web-crawl for "Audio Retrieval with WavText5K and CLAP Training"
☆49Updated last year
Related projects: ⓘ
- Audio Captioning datasets for PyTorch.☆98Updated 2 weeks ago
- Implementation of the paper "Self-supervised Learning with Random-projection Quantizer for Speech Recognition" in Pytorch.☆56Updated last year
- A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models (ICASSP 2024)☆44Updated 5 months ago
- 🦇 Encoder of BAT (Learning to Reason about Spatial Sounds with Large Language Models)☆28Updated 3 months ago
- Learning differentiable temporal resolution on time-series data.☆33Updated last year
- Implementation of the model "AudioFlamingo" from the paper: "Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dial…☆38Updated last week
- Unofficial PyTorch implementation of Masked Autoencoders that Listen☆61Updated 2 years ago
- Audio captioning recipe☆40Updated 2 months ago
- ☆35Updated 2 years ago
- 《SpeechPrompt v2: Prompt Tuning for Speech Classification Tasks》Speech processing with prompting paradigm☆80Updated 11 months ago
- Implementation of "Audio Retrieval with Natural Language Queries: A Benchmark Study".☆45Updated 2 years ago
- AudioBench: A Universal Benchmark for Audio Large Language Models☆61Updated 2 weeks ago
- This is a list of speech tasks and datasets, which can provide training data for Generative AI, AIGC, AI model training, intelligent spee…☆72Updated 3 months ago
- ☆50Updated last year
- Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model☆75Updated 2 weeks ago
- Code for CVSSP submission to DCASE 2021 Task 6☆35Updated last year
- ☆22Updated 2 months ago
- ☆62Updated 8 months ago
- This is the official repository of the papers "Parameter-Efficient Transfer Learning of Audio Spectrogram Transformers" and "Efficient Fi…☆33Updated last month
- ☆35Updated last year
- The open source code for LLM-Codec☆106Updated last month
- Public Code for the paper MAE-AST: Masked Autoencoding Audio Spectrogram Transformer☆82Updated 2 years ago
- Denoising Diffusion Autoregressive Model for Raw Speech Waveform Generation☆23Updated 6 months ago
- SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model, Accepted to IEEE SLT 2022☆108Updated last year
- Training code and trained checkpoints for ASGAN.☆60Updated 8 months ago
- Official Implementation of EnCLAP (ICASSP 2024)☆88Updated 3 months ago
- SLMTokBench for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"☆32Updated last year
- experiments about AudioSet☆43Updated last year
- Source for the Interspeech 2024 Paper "Scaling up masked audio encoder learning for general audio classification"☆39Updated 2 weeks ago
- Models and code for RepCodec: A Speech Representation Codec for Speech Tokenization☆146Updated 2 months ago