XinhaoMei / WavCaps
This reporsitory contains metadata of WavCaps dataset and codes for downstream tasks.
β221Updated 8 months ago
Alternatives and similar repositories for WavCaps:
Users that are interested in WavCaps are comparing it to the libraries listed below
- Audio Captioning datasets for PyTorch.β115Updated 3 weeks ago
- π Repository for our NAACL-HLT 2019 paper: AudioCapsβ161Updated last month
- Models and code for RepCodec: A Speech Representation Codec for Speech Tokenizationβ174Updated 9 months ago
- An Audio Language model for Audio Tasksβ304Updated 11 months ago
- This package aims at simplifying the download of the AudioCaps dataset.β33Updated last year
- [IJCAI 2024] EAT: Self-Supervised Pre-Training with Efficient Audio Transformerβ144Updated 3 months ago
- A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipelineβ126Updated 4 months ago
- Official Implementation of EnCLAP (ICASSP 2024)β91Updated 10 months ago
- AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehensionβ88Updated 4 months ago
- AudioLDM training, finetuning, evaluation and inference.β244Updated 4 months ago
- The official repository of SpeechCraft dataset, a large-scale expressive bilingual speech dataset with natural language descriptions.β120Updated this week
- Audio-FLANβ142Updated last month
- Code for the paper: GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilitiesβ122Updated 4 months ago
- MU-LLaMA: Music Understanding Large Language Modelβ274Updated last year
- AudioBench: A Universal Benchmark for Audio Large Language Modelsβ193Updated 2 weeks ago
- Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)β140Updated last year
- β161Updated 9 months ago
- Code for the IEEE Signal Processing Letters 2022 paper "UAVM: Towards Unifying Audio and Visual Models".β54Updated last year
- Scripts for download AudioSetβ73Updated 7 years ago
- The dataset and baseline code for Text-to-Audio Grounding (TAG)β42Updated 3 months ago
- AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Modelβ198Updated 2 weeks ago
- β58Updated 3 weeks ago
- β40Updated 2 years ago
- Versatile Evaluation of Speech and Audioβ183Updated this week
- Official codes and models of the paper "Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generatiβ¦β181Updated last year
- Audio captioning recipeβ46Updated 5 months ago
- Web-crawl for "Audio Retrieval with WavText5K and CLAP Training"β49Updated 2 years ago
- Source for the Interspeech 2024 Paper "Scaling up masked audio encoder learning for general audio classification"β59Updated 2 months ago
- A Survey of Spoken Dialogue Models (60 pages)β287Updated 4 months ago
- Implementation of our paper 'On Metric Learning For Audio-Text Cross-Modal Retrieval'β43Updated 2 years ago