nervjack2/MelHuBERT

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/nervjack2/MelHuBERT)

nervjack2 / MelHuBERT

Official implementation of MelHuBERT

☆70

Alternatives and similar repositories for MelHuBERT

Users that are interested in MelHuBERT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

nervjack2 / Speech2Unit
View on GitHub
☆13Sep 25, 2024Updated last year
B06901052 / DeepSpeed
View on GitHub
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
☆13Oct 11, 2022Updated 3 years ago
voidful / Codec-SUPERB
View on GitHub
Audio Codec Speech processing Universal PERformance Benchmark
☆308Jul 4, 2026Updated 3 weeks ago
ga642381 / SpeechPrompt
View on GitHub
**Interspeech 2022** 《SpeechPrompt: An Exploration of Prompt Tuning on Generative Spoken Language Model for Speech Processing Tasks》Speec…
☆102Apr 10, 2025Updated last year
MiscellaneousStuff / PhoneLM
View on GitHub
(R&D) Text to speech using phonemes as inputs and audio codec codes as outputs. Loosely based on MegaByte, VALL-E and Encodec.
☆48Sep 4, 2023Updated 2 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
Splend1d / T5lephone
View on GitHub
Code for T5lephone: Bridging Speech and Text Self-supervised Models for Spoken Language Understanding via Phoneme level T5
☆19Nov 29, 2022Updated 3 years ago
AlanBaade / SyllableLM
View on GitHub
Official Code for SyllableLM: Learning Coarse Semantic Units for Speech Language Models
☆63Jul 1, 2025Updated last year
0nutation / USLM
View on GitHub
Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)
☆152Sep 14, 2023Updated 2 years ago
JSALT-2022-SSL / superb-prosody
View on GitHub
☆31Jul 13, 2023Updated 3 years ago
p1an-lin-jung / wv_tts
View on GitHub
☆19Mar 22, 2024Updated 2 years ago
yanghaha0908 / FastHuBERT
View on GitHub
Official implementation for Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation Learning
☆100Nov 20, 2024Updated last year
yzGuu830 / efficient-speech-codec
View on GitHub
[EMNLP 2024] ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers
☆126Mar 20, 2025Updated last year
ankitapasad / layerwise-analysis
View on GitHub
Layer-wise analysis of self-supervised pre-trained speech representations
☆135Oct 18, 2024Updated last year
voidful / asrp
View on GitHub
ASR text preprocessing utility
☆21Aug 5, 2024Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
X-LANCE / StoryTTS
View on GitHub
[ICASSP 2024] StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations
☆141Apr 27, 2024Updated 2 years ago
pyf98 / DPHuBERT
View on GitHub
INTERSPEECH 2023: "DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models"
☆118Jan 26, 2024Updated 2 years ago
ga642381 / SpeechGen
View on GitHub
《SpeechGen: Unlocking the Generative Power of Speech Language Models with Prompts》
☆77Jun 9, 2023Updated 3 years ago
p0p4k / Matcha-TTS-2
View on GitHub
E2E TTS using Conditional Flow Matching (Experimental*)
☆71Nov 10, 2023Updated 2 years ago
asuni / PitchSqueezer
View on GitHub
A robust pitch tracker using synchro-squeezed fft and frequency domain autocorrelation
☆38Jan 17, 2024Updated 2 years ago
adelacvg / diff-vits
View on GitHub
☆39Oct 1, 2023Updated 2 years ago
0nutation / SLMTokBench
View on GitHub
SLMTokBench for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"
☆37Aug 29, 2023Updated 2 years ago
adelacvg / ttts
View on GitHub
Train the next generation of TTS systems.
☆169Sep 13, 2024Updated last year
George0828Zhang / torch_cif
View on GitHub
A fast parallel PyTorch implementation of the "CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition" https://arxiv.org/ab…
☆37Feb 10, 2024Updated 2 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
Aria-K-Alethia / laughter-synthesis
View on GitHub
Official implementation of the paper "Laughter Synthesis using Pseudo Phonetic Tokens with a Large-scale In-the-wild Laughter Corpus" acc…
☆77Jul 16, 2023Updated 3 years ago
haiciyang / LaDiffCodec
View on GitHub
ICASSP 2024 - Generative De-Quantization for Neural Speech Codec via Latent Diffusion.
☆56Nov 16, 2025Updated 8 months ago
AI-S2-Lab / FluentEditor
View on GitHub
[InterSpeech'2024] FluentEditor:Text-based Speech Editing by Considering Acoustic and Prosody Consistency
☆62Oct 23, 2024Updated last year
slp-rl / SLM-Discrete-Representations
View on GitHub
This repo contains the official PyTorch implementation of "Analyzing Discrete Self Supervised Speech Representation For Spoken Language M…
☆20Jan 3, 2023Updated 3 years ago
lifeiteng / NaturalSpeech2
View on GitHub
☆33Jun 29, 2023Updated 3 years ago
mct10 / RepCodec
View on GitHub
Models and code for RepCodec: A Speech Representation Codec for Speech Tokenization
☆196Jul 12, 2024Updated 2 years ago
ex3ndr / supervoice-gpt-facodec
View on GitHub
GPT for FACodec
☆13Mar 25, 2024Updated 2 years ago
sungnyun / ARMHuBERT
View on GitHub
(Interspeech 2023 & ICASSP 2024) Official repository for ARMHuBERT and STaRHuBERT
☆41Aug 29, 2024Updated last year
howard1337 / S2VC
View on GitHub
☆100Jul 22, 2021Updated 5 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
asappresearch / simple-tts
View on GitHub
Contains the code associated with the ICLR submission for our text-to-speech diffusion model
☆57Oct 31, 2023Updated 2 years ago
dhchoi99 / NANSY
View on GitHub
☆171Jul 25, 2022Updated 4 years ago
Srijith-rkr / Whispering-LLaMA
View on GitHub
EMNLP 23 - Integrating Whisper Encoder to LLaMA Decoder for Generative ASR Error Correction
☆271May 19, 2024Updated 2 years ago
ShovalMessica / NAST
View on GitHub
Official repository for NAST: Noise Aware Speech Tokenization for Speech Language Models (Interspeech 2024) https://arxiv.org/abs/2406.11…
☆46Jul 2, 2024Updated 2 years ago
ZhangXInFD / soundstorm-speechtokenizer
View on GitHub
Implementation of SoundStorm built upon SpeechTokenizer.
☆116Nov 2, 2023Updated 2 years ago
mechanicalsea / lighthubert
View on GitHub
LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT
☆73Sep 26, 2022Updated 3 years ago
JusperLee / Gull-Codec-Training
View on GitHub
☆12Mar 11, 2025Updated last year