☆61Jan 8, 2025Updated last year
Alternatives and similar repositories for audio-foundation-model-dataset
Users that are interested in audio-foundation-model-dataset are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A repository of Japanese Phoneme-Level BERT☆24Dec 16, 2023Updated 2 years ago
- 日本音響学会誌用BibTeXスタイルファイル☆11Jan 24, 2022Updated 4 years ago
- Ultra-low-bitrate Speech Codec for Speech Language Modeling Applications☆90Dec 20, 2024Updated last year
- Unofficial implementation of wavenext vocoder☆60Aug 28, 2024Updated last year
- Coco-Nut (Corpus of connecting NIHONGO utterance and text) corpus☆21Jun 12, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- JATTS: A modern, research-oriented Japanese Text-to-speech Open-sourced Toolkit☆44Mar 13, 2026Updated last month
- This repository contains prompts & best practices to annotate audio clips with a very high degree of details using Audio-Language-Models☆35Oct 13, 2024Updated last year
- PyTorch implementation of WaveFit [2022, Google] which is one of SOTA lightweight/fast speech vocoders.☆64Sep 8, 2025Updated 7 months ago
- Versatile Evaluation of Speech and Audio☆400Dec 9, 2025Updated 4 months ago
- Survey of audio language models☆65Apr 1, 2026Updated 2 weeks ago
- xvector model on jtubespeech☆47Nov 5, 2023Updated 2 years ago
- ☆10May 16, 2024Updated last year
- source code of EfficientTTS 2☆20Feb 18, 2024Updated 2 years ago
- Forced alignment decoder for Whisper.☆15Mar 13, 2024Updated 2 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Crowdsourced and Automatic Speech Prominence Estimation☆26Apr 12, 2024Updated 2 years ago
- My vocoder experiments☆31Jul 26, 2025Updated 8 months ago
- Reimplementation of Miipher☆30Aug 16, 2023Updated 2 years ago
- UT-Sarulab MOS prediction system using SSL models☆298Apr 11, 2024Updated 2 years ago
- pyopenjtalk-plus: A Python wrapper for OpenJTalk with additional improvements☆57Mar 30, 2026Updated 2 weeks ago
- LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning☆160Jun 13, 2024Updated last year
- DDPM-based Pitch Generation and Pitch Controllable Voice Synthesis.☆55Sep 25, 2023Updated 2 years ago
- A simple and user-friendly tool for computing STFT/DGT☆19Jun 22, 2021Updated 4 years ago
- SLMTokBench for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"☆37Aug 29, 2023Updated 2 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- a Neural Vocoder supporting Ring Attention, Conformer and NSF.☆25Aug 1, 2025Updated 8 months ago
- Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)☆151Sep 14, 2023Updated 2 years ago
- CTC decoder with hotwords for ASR.☆35Apr 13, 2025Updated last year
- ☆49Jul 22, 2024Updated last year
- ☆36Sep 20, 2022Updated 3 years ago
- Implementation of the paper "Variable Bitrate Residual Vector Quantization for Audio Coding"☆11Apr 10, 2025Updated last year
- ひたすら楽して音響信号解析:プログラム置き場☆27Jan 25, 2024Updated 2 years ago
- Speech Human Evaluation Estimation Toolkit (SHEET)☆134Mar 31, 2026Updated 2 weeks ago
- Multi-lingual AudioCaps☆12Nov 20, 2023Updated 2 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- PyTorch implementation of the ICASSP-24 paper: "Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Superv…☆39Jan 6, 2024Updated 2 years ago
- HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis☆45Mar 2, 2021Updated 5 years ago
- ☆13Oct 11, 2024Updated last year
- Speaker embedding for anime speech domain based on ECAPA_TDNN☆18Jun 22, 2025Updated 9 months ago
- tdmelodic for open-jtalk☆25Aug 30, 2021Updated 4 years ago
- ☆25Jan 24, 2023Updated 3 years ago
- Acoustic measurement using music pieces☆12Aug 5, 2022Updated 3 years ago