A song aesthetic evaluation toolkit trained on SongEval.
☆283Jun 15, 2025Updated 8 months ago
Alternatives and similar repositories for SongEval
Users that are interested in SongEval are comparing it to the libraries listed below
Sorting:
- Official repository of the paper "MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization".☆311Aug 4, 2025Updated 6 months ago
- [ICASSP 2025] AnCoGen: Analysis, Control and Generation of Speech with a Masked Autoencoder☆12Mar 11, 2025Updated 11 months ago
- Audio-FLAN☆160Sep 23, 2025Updated 5 months ago
- MelodyT5: A Unified Score-to-Score Transformer for Symbolic Music Processing [ISMIR 2024]☆46Jan 23, 2025Updated last year
- [ACL 2025] OZSpeech: One-step Zero-shot Speech Synthesis with Learned-Prior-Conditioned Flow Matching☆45Feb 9, 2025Updated last year
- CLaMP 3: Universal Music Information Retrieval Across Unaligned Modalities and Unseen Languages [ACL 2025]☆220May 11, 2025Updated 9 months ago
- A Massive Contextual Speech Recognition Benchmark.☆99Aug 6, 2025Updated 6 months ago
- OSUM & OSUM-EChat, open speech understanding model and empathetic spoken chatbot based on it, open-sourced by ASLP@NPU.☆482Nov 23, 2025Updated 3 months ago
- ☆156Nov 22, 2024Updated last year
- Variable Bitrate Residual Vector Quantization for Audio Coding☆51May 1, 2025Updated 10 months ago
- ☆102Oct 16, 2025Updated 4 months ago
- [ACL 2025 Main] UniCodec: a unified audio codec with a single codebook to support multi-domain audio data, including speech, music, and s…☆154May 30, 2025Updated 9 months ago
- A Large-scale Wu Dialect Speech Corpus with Multi-dimensional Annotations☆98Feb 6, 2026Updated 3 weeks ago
- ☆36Sep 6, 2025Updated 5 months ago
- Unified automatic quality assessment for speech, music, and sound.☆681Jun 5, 2025Updated 8 months ago
- Di♪♪Rhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion☆2,251Nov 27, 2025Updated 3 months ago
- AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model☆291Oct 12, 2025Updated 4 months ago
- Inference code for Audiodec-Valle-Wenetspeech4TTS☆50Jul 14, 2024Updated last year
- ☆15Aug 22, 2025Updated 6 months ago
- A Singing Style Conversion Framework Based On Audio Infilling☆33Apr 28, 2025Updated 10 months ago
- [NeurIPS 2025] Benchmark data and code for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix☆197Feb 17, 2026Updated last week
- ☆99Jan 19, 2026Updated last month
- The open source code for LLM-Codec☆145Aug 18, 2024Updated last year
- SpeechJudge: Towards Human-Level Judgment for Speech Naturalness (https://arxiv.org/abs/2511.07931)☆63Dec 23, 2025Updated 2 months ago
- A TTS Trained on Universal Audio.☆41Jun 6, 2025Updated 8 months ago
- ☆17Jan 20, 2025Updated last year
- [ICASSP 2025] FreeSVC: Towards Zero-shot Multilingual Singing Voice Conversion☆91Jul 23, 2025Updated 7 months ago
- Speech Resynthesis and Language Modeling☆27Jun 11, 2025Updated 8 months ago
- ☆18May 4, 2025Updated 9 months ago
- Ultra-low bitrate neural audio codec (0.31~1.40 kbps) with a better semantic in the latent space.☆246Mar 7, 2025Updated 11 months ago
- An instruct text-to-speech solution based on LLaSA and CosyVoice2 developed by the ASLP lab and collaborators.☆220Jan 20, 2026Updated last month
- Official implementation for FlowSep☆70Jan 2, 2025Updated last year
- Llasa Speed Up☆60Jan 18, 2026Updated last month
- Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)☆62Nov 1, 2024Updated last year
- This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaD…☆76Jan 25, 2026Updated last month
- JamendoMaxCaps is a large-scale dataset of 362,000 instrumental creative commons tracks☆46May 24, 2025Updated 9 months ago
- Audio-JEPA is an adaptation of the Joint-Embedding Predictive Architecture (JEPA) for self-supervised audio representation learning. Buil…☆40Jun 17, 2025Updated 8 months ago
- Official repo for CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations☆62Jan 16, 2025Updated last year
- Tidy Tunes is an easy-to-use pipeline for mining high-quality audio data for speech generation models. To do so, it chains multiple open …☆22Feb 7, 2026Updated 3 weeks ago