Alexander-H-Liu/dinosr

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Alexander-H-Liu/dinosr)

Alexander-H-Liu / dinosr

DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning

☆53

Alternatives and similar repositories for dinosr

Users that are interested in dinosr are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

jasonppy / word-discovery
View on GitHub
Word Discovery in Visually Grounded, Self-Supervised Speech Models
☆27Dec 4, 2023Updated 2 years ago
ftshijt / speech_evaluation
View on GitHub
A toolkit dedicate for speech evaluation.
☆23Sep 26, 2024Updated last year
sungnyun / ARMHuBERT
View on GitHub
(Interspeech 2023 & ICASSP 2024) Official repository for ARMHuBERT and STaRHuBERT
☆41Aug 29, 2024Updated last year
yanghaha0908 / FastHuBERT
View on GitHub
Official implementation for Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation Learning
☆100Nov 20, 2024Updated last year
JusperLee / Gull-Codec-Training
View on GitHub
☆12Mar 11, 2025Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
AlanBaade / MAE-AST-Public
View on GitHub
Public Code for the paper MAE-AST: Masked Autoencoding Audio Spectrogram Transformer
☆93Jun 9, 2022Updated 4 years ago
AlanBaade / SyllableLM
View on GitHub
Official Code for SyllableLM: Learning Coarse Semantic Units for Speech Language Models
☆63Jul 1, 2025Updated last year
Splend1d / T5lephone
View on GitHub
Code for T5lephone: Bridging Speech and Text Self-supervised Models for Spoken Language Understanding via Phoneme level T5
☆19Nov 29, 2022Updated 3 years ago
mubtasimahasan / DM-Codec
View on GitHub
Source code for the EMNLP 2025 paper “DM-Codec: Distilling Multimodal Representations for Speech Tokenization”
☆57Jun 1, 2025Updated last year
lucadellalib / discrete-wavlm-codec
View on GitHub
A neural speech codec based on discrete WavLM representations
☆26Aug 28, 2024Updated last year
jasonppy / FaST-VGS-Family
View on GitHub
Transformer-based visually grounded speech models
☆19Sep 22, 2022Updated 3 years ago
ga642381 / AudioCodec-Hub
View on GitHub
AudioCodec-Hub is a Python library for encoding and decoding audio data, supporting various neural audio codec models
☆25Sep 26, 2023Updated 2 years ago
leto19 / WhiSQA
View on GitHub
Whisper Speech Quality Assessment (WhiSQA)
☆16Apr 14, 2026Updated 3 months ago
YuanGongND / ltu
View on GitHub
Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".
☆478Apr 24, 2024Updated 2 years ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
pyf98 / DPHuBERT
View on GitHub
INTERSPEECH 2023: "DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models"
☆118Jan 26, 2024Updated 2 years ago
ga642381 / SpeechGen
View on GitHub
《SpeechGen: Unlocking the Generative Power of Speech Language Models with Prompts》
☆77Jun 9, 2023Updated 3 years ago
roger-tseng / av-superb
View on GitHub
A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models (ICASSP 2024)
☆58Apr 17, 2024Updated 2 years ago
Respaired / RiFornet_Vocoder
View on GitHub
a Neural Vocoder supporting Ring Attention, Conformer and NSF.
☆25Aug 1, 2025Updated 11 months ago
yzGuu830 / efficient-speech-codec
View on GitHub
[EMNLP 2024] ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers
☆126Mar 20, 2025Updated last year
kuan2jiu99 / audio-hallucination
View on GitHub
Understanding and Tackling Hallucinations in Large Audio-Language Models | ICASSP 2025, Interspeech 2024
☆34Mar 14, 2025Updated last year
30stomercury / hmm-backprop
View on GitHub
Fast and differentiable hidden Markov model in C++
☆19Jan 20, 2023Updated 3 years ago
facebookresearch / spidr
View on GitHub
This repository contains the training code from paper "SpidR Learning Fast and Stable Linguistic Units for Spoken Language Models Without…
☆57Jul 22, 2026Updated last week
yangdongchao / ALMTokenizer2
View on GitHub
The open source code of ALMTokenizer2: Towards Low bit-rate and Semantic-rich Audio Tokenizer with Flow-based Scalar Diffusion Transforme…
☆45Sep 5, 2025Updated 10 months ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
ga642381 / SpeechPrompt-v2
View on GitHub
《SpeechPrompt v2: Prompt Tuning for Speech Classification Tasks》Speech processing with prompting paradigm
☆81Oct 19, 2023Updated 2 years ago
AmphionTeam / SD-Eval
View on GitHub
[NeurIPS 2024] SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words
☆57Jun 25, 2024Updated 2 years ago
lwang114 / GraphUnsupASR
View on GitHub
☆10Apr 17, 2024Updated 2 years ago
apple / pytorch-speech-features
View on GitHub
☆87Apr 2, 2024Updated 2 years ago
GasserElbanna / serab-byols
View on GitHub
(Hybrid) BYOL-S feature extractor using serab-byols package in pytorch.
☆27Apr 20, 2024Updated 2 years ago
kamperh / vqwordseg
View on GitHub
Unsupervised phone and word segmentation using dynamic programming on self-supervised VQ features.
☆39May 5, 2026Updated 2 months ago
cantabile-kwok / vec2wav2.0
View on GitHub
Code for vec2wav 2.0, a speech token vocoder for VC. Paper: https://arxiv.org/abs/2409.01995
☆79Dec 3, 2024Updated last year
aqtq314 / VogenSVS
View on GitHub
☆15Apr 16, 2026Updated 3 months ago
reppy4620 / convnext_tts
View on GitHub
Unofficial implementation of ConvNeXt-TTS powered by lightning
☆18Oct 20, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
Chung-I / youtube-asr-crawler
View on GitHub
☆10Sep 19, 2022Updated 3 years ago
mct10 / RepCodec
View on GitHub
Models and code for RepCodec: A Speech Representation Codec for Speech Tokenization
☆196Jul 12, 2024Updated 2 years ago
OsciiArt / Freesound-Audio-Tagging-2019
View on GitHub
☆19Sep 21, 2019Updated 6 years ago
Aria-K-Alethia / BigCodec
View on GitHub
Official implementation of the paper "BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec"
☆218Sep 19, 2024Updated last year
AI-S2-Lab / GPT-Talker
View on GitHub
[ACMMM'2024] Generative Expressive Conversational Speech Synthesis
☆45Oct 28, 2024Updated last year
MuyangDu / T5Voice
View on GitHub
T5Voice is a lightweight PyTorch implementation of T5-based text-to-speech synthesis, supporting both streaming and non-streaming speech …
☆28Nov 7, 2025Updated 8 months ago
kjw11 / CSEnet-ASR
View on GitHub
Cross-Speaker Encoding Network for Multi-talker Speech Recognition
☆12Mar 14, 2025Updated last year