Audio Generation model working with GPT-2 and VQVAE compressed representation of MelSpectrograms
β18Oct 8, 2023Updated 2 years ago
Alternatives and similar repositories for MelSpec_GPT_VQVAE
Users that are interested in MelSpec_GPT_VQVAE are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- PyTorch implementation of Retriever: Learning Content-Style Representationβ12Jan 27, 2023Updated 3 years ago
- π΅ Partnership with AI to create Beatsβ11Oct 13, 2020Updated 5 years ago
- β37May 8, 2021Updated 5 years ago
- A repository comprising of code for generation of noisy speech data from clean data using deep learning methodsβ16Jul 12, 2021Updated 4 years ago
- TTSεοΌζζ¬ζ εεοΌε°ζ°εεζ―ε€η转εδΈΊζ±εβ12Apr 27, 2024Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer β’ AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Official repository for "Structure-Enhanced Pop Music Generation via Harmony-Aware Learning", ACM MM 2022.β14Mar 22, 2023Updated 3 years ago
- Wave-U-Net for automatic (drum) mixingβ38Mar 24, 2023Updated 3 years ago
- Crawled from FreeMidi.org, MIDI files library including over twenty thousand files!β32Jun 6, 2020Updated 6 years ago
- python wrap for hts engineβ14Jan 30, 2018Updated 8 years ago
- β12Jul 6, 2023Updated 2 years ago
- β25Apr 24, 2019Updated 7 years ago
- β15May 8, 2021Updated 5 years ago
- β19Feb 2, 2023Updated 3 years ago
- The source code for the paper CrossSinger (asru2023)β18Oct 12, 2023Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer β’ AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- β69Mar 31, 2021Updated 5 years ago
- β44Jun 10, 2024Updated 2 years ago
- Code for paper A3T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editingβ89Sep 6, 2024Updated last year
- ICASSP 2021 accepted papers in term of voice conversion (VC)β18Apr 11, 2021Updated 5 years ago
- ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice forβ¦β44Dec 17, 2020Updated 5 years ago
- A single-layer, streaming codec model providing SOTA audio quality and discrete tokens designed for superior downstream modelability.β118Jun 4, 2025Updated last year
- A fundamental frequency estimation algorithm using features from the magnitude and phase spectrogram.β24Mar 29, 2021Updated 5 years ago
- The DJ Mix Datasetβ18Sep 7, 2022Updated 3 years ago
- Please visit https://thuhcsi.github.io/SnakeGAN/β37Apr 25, 2023Updated 3 years ago
- Managed Database hosting by DigitalOcean β’ AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Please visit: https://thuhcsi.github.io/icassp2021-emotion-tts/β34Mar 17, 2023Updated 3 years ago
- Phonemes and durations labeling based on whisper smallβ11Jul 7, 2024Updated last year
- ISMIR 2020 Paper repo: Music SketchNet: Controllable Music Generation via Factorized Representations of Pitch and Rhythmβ83Oct 3, 2023Updated 2 years ago
- β35Sep 7, 2022Updated 3 years ago
- β55Jan 13, 2023Updated 3 years ago
- Realtime (streaming) DDSP in PyTorch compatible with neutoneβ51Feb 4, 2025Updated last year
- β81Aug 8, 2025Updated 10 months ago
- β45Dec 16, 2019Updated 6 years ago
- β11May 7, 2022Updated 4 years ago
- Deploy on Railway without the complexity - Free Credits Offer β’ AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- This is an unofficial implementation of universal melgan according to https://arxiv.org/abs/2011.09631β23Aug 15, 2022Updated 3 years ago
- visual-text to speechβ14Apr 3, 2022Updated 4 years ago
- [ICCV'21] The Right to Talk: An Audio-Visual Transformer Approachβ20Aug 2, 2021Updated 4 years ago
- β26Sep 22, 2022Updated 3 years ago
- FCTalker: Fine and Coarse Grained Context Modeling for Expressive Conversational Speech Synthesis (Accepted by ISCSLP'2024)β26Feb 22, 2024Updated 2 years ago
- Real-time melgan based on cpu οΌοΌοΌβ13Dec 3, 2019Updated 6 years ago
- β180Oct 24, 2023Updated 2 years ago