cwx-worst-one / EAT
[IJCAI 2024] EAT: Self-Supervised Pre-Training with Efficient Audio Transformer
☆122Updated 3 weeks ago
Alternatives and similar repositories for EAT:
Users that are interested in EAT are comparing it to the libraries listed below
- This is an evolving repo for the paper "Towards Controllable Speech Synthesis in the Era of Large Language Models: A Survey".☆109Updated this week
- A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline☆111Updated last month
- Audio Captioning datasets for PyTorch.☆111Updated 2 months ago
- unofficial implementation of the High Fidelity Neural Audio Compression☆140Updated 5 months ago
- Source for the Interspeech 2024 Paper "Scaling up masked audio encoder learning for general audio classification"☆49Updated last week
- AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model☆131Updated last week
- ☆149Updated 6 months ago
- Models and code for RepCodec: A Speech Representation Codec for Speech Tokenization☆164Updated 6 months ago
- The official repository of SpeechCraft dataset, a large-scale expressive bilingual speech dataset with natural language descriptions.☆77Updated 2 weeks ago
- [AAAI 2024] Code for CTX-vec2wav in UniCATS☆125Updated 7 months ago
- ☆62Updated 4 months ago
- UTokyo-SaruLab MOS Prediction System☆127Updated last month
- Awesome Neural Codec Models, Text-to-Speech Synthesizers & Speech Language Models☆95Updated this week
- Official Implementation of the work "Audio Mamba: Bidirectional State Space Model for Audio Representation Learning"☆119Updated last month
- Versatile Evaluation of Speech and Audio☆146Updated 2 weeks ago
- ☆63Updated last year
- Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)☆140Updated last year
- This package aims at simplifying the download of the AudioCaps dataset.☆31Updated last year
- 🦇 Encoder of BAT (Learning to Reason about Spatial Sounds with Large Language Models)☆36Updated 3 months ago
- ACM MM 2024 FlashSpeech: Efficient Zero-Shot Speech Synthesis☆120Updated 3 months ago
- Real-time Speech-Text Foundation Model Toolkit (wip)☆126Updated 3 months ago
- A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models (ICASSP 2024)☆51Updated 9 months ago
- Reference-aware automatic speech evaluation toolkit☆139Updated last month
- Official Implement of Multi-Stage Multi-Codebook (MSMC) TTS☆162Updated 9 months ago
- Official implementation of the paper "BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec"☆114Updated 3 months ago
- A library built for easier audio self-supervised training, downstream tasks evaluation☆110Updated 4 months ago
- Source code for Consistent ensemble distillation for audio tagging☆21Updated 6 months ago
- Official Implementation of EnCLAP (ICASSP 2024)☆90Updated 7 months ago
- Evaluation Protocol for Large-Scale Zero-Shot TTS Literature☆69Updated 3 months ago
- This repo includes the official implementations of "Fine-tune the pretrained ATST model for sound event detection".☆111Updated 3 months ago