☆65Jan 8, 2025Updated last year
Alternatives and similar repositories for audio-foundation-model-dataset
Users that are interested in audio-foundation-model-dataset are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A repository of Japanese Phoneme-Level BERT☆24Dec 16, 2023Updated 2 years ago
- 日本音響学会誌用BibTeXスタイルファイル☆11Jan 24, 2022Updated 4 years ago
- Ultra-low-bitrate Speech Codec for Speech Language Modeling Applications☆92Dec 20, 2024Updated last year
- Unofficial implementation of wavenext vocoder☆59Aug 28, 2024Updated last year
- Coco-Nut (Corpus of connecting NIHONGO utterance and text) corpus☆21Jun 12, 2024Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- source code of EfficientTTS 2☆21Feb 18, 2024Updated 2 years ago
- JATTS: A modern, research-oriented Japanese Text-to-speech Open-sourced Toolkit☆43Mar 13, 2026Updated 3 months ago
- This repository contains prompts & best practices to annotate audio clips with a very high degree of details using Audio-Language-Models☆35Oct 13, 2024Updated last year
- Versatile Evaluation of Speech and Audio☆414May 29, 2026Updated 3 weeks ago
- PyTorch implementation of WaveFit [2022, Google] which is one of SOTA lightweight/fast speech vocoders.☆66Sep 8, 2025Updated 9 months ago
- Survey of audio language models☆65Apr 18, 2026Updated 2 months ago
- xvector model on jtubespeech☆47Nov 5, 2023Updated 2 years ago
- ☆10May 16, 2024Updated 2 years ago
- Forced alignment decoder for Whisper.☆16Mar 13, 2024Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Crowdsourced and Automatic Speech Prominence Estimation☆27Apr 12, 2024Updated 2 years ago
- My vocoder experiments☆31Jul 26, 2025Updated 10 months ago
- Reimplementation of Miipher☆30Aug 16, 2023Updated 2 years ago
- UT-Sarulab MOS prediction system using SSL models☆306Apr 11, 2024Updated 2 years ago
- pyopenjtalk-plus: A Python wrapper for OpenJTalk with additional improvements☆58Mar 30, 2026Updated 2 months ago
- LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning☆161Jun 13, 2024Updated 2 years ago
- DDPM-based Pitch Generation and Pitch Controllable Voice Synthesis.☆55Sep 25, 2023Updated 2 years ago
- A simple and user-friendly tool for computing STFT/DGT☆19Jun 22, 2021Updated 4 years ago
- SLMTokBench for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"☆37Aug 29, 2023Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- a Neural Vocoder supporting Ring Attention, Conformer and NSF.☆25Aug 1, 2025Updated 10 months ago
- Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)☆152Sep 14, 2023Updated 2 years ago
- CTC decoder with hotwords for ASR.☆36Apr 13, 2025Updated last year
- ☆49Jul 22, 2024Updated last year
- ☆37Sep 20, 2022Updated 3 years ago
- Implementation of the paper "Variable Bitrate Residual Vector Quantization for Audio Coding"☆11Apr 10, 2025Updated last year
- ひたすら楽して音響信号解析:プログラム置き場☆28Jan 25, 2024Updated 2 years ago
- Speech Human Evaluation Estimation Toolkit (SHEET)☆135Mar 31, 2026Updated 2 months ago
- PyTorch implementation of the ICASSP-24 paper: "Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Superv…☆41Jan 6, 2024Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis☆45Mar 2, 2021Updated 5 years ago
- ☆13Oct 11, 2024Updated last year
- Speaker embedding for anime speech domain based on ECAPA_TDNN☆20Jun 22, 2025Updated 11 months ago
- tdmelodic for open-jtalk☆26Aug 30, 2021Updated 4 years ago
- Multi-lingual AudioCaps☆14Nov 20, 2023Updated 2 years ago
- ☆25Jan 24, 2023Updated 3 years ago
- Acoustic measurement using music pieces☆12Aug 5, 2022Updated 3 years ago