xiaomi-research/dasheng-glap

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/xiaomi-research/dasheng-glap)

xiaomi-research / dasheng-glap

Official Implementation of GLAP - General Language Audio Pretraining

☆75

Alternatives and similar repositories for dasheng-glap

Users that are interested in dasheng-glap are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

xiaomi-research / dasheng-tokenizer
View on GitHub
State-of-the-art continious audio tokenization
☆40Mar 9, 2026Updated 4 months ago
xiaomi-research / mecat
View on GitHub
☆44May 12, 2026Updated 2 months ago
xiaomi-research / dasheng-audiogen
View on GitHub
end-to-end text to audio scene generation model
☆50Jun 16, 2026Updated last month
NieeiM / Dasheng-Audiogen
View on GitHub
Generate a complete audio clip with music, intelligible speech, and sound effects from text in one pass.
☆44May 27, 2026Updated 2 months ago
RicherMans / Dasheng
View on GitHub
Source for the Interspeech 2024 Paper "Scaling up masked audio encoder learning for general audio classification"
☆86Nov 7, 2025Updated 8 months ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
XiaoMi / dasheng
View on GitHub
Official PyTorch code for Deep Audio-Signal Holistic Embeddings
☆200Nov 7, 2025Updated 8 months ago
xiaomi-research / acavcaps
View on GitHub
☆31Mar 27, 2026Updated 4 months ago
yfyeung / CLSP
View on GitHub
[ACL 2026 Main] Open-Ended Speaking Style Modeling via Fine-Grained and Multi-Granular Contrastive Language-Speech Pre-training
☆104Apr 6, 2026Updated 3 months ago
xiaomi-research / dasheng-denoiser
View on GitHub
Official PyTorch inference code for the Interspeech 2025 paper: Efficient Speech Enhancement via Embeddings from Pre-trained Generative A…
☆81Jun 16, 2025Updated last year
yanghaha0908 / WavCube
View on GitHub
Official code for "WavCube: Unifying Speech Representation for Understanding and Generation via Semantic-Acoustic Joint Modeling"
☆62Jun 27, 2026Updated last month
wsntxxn / UniFlow-Audio
View on GitHub
☆74Jul 17, 2026Updated last week
xiaomi-research / xares-llm
View on GitHub
XARES-LLM
☆55Mar 26, 2026Updated 4 months ago
xiquan-li / FineLAP
View on GitHub
[ACL 2026 Main] FineLAP: Taming Heterogeneous Supervision for Fine-grained Language-Audio Pre-training
☆36Apr 20, 2026Updated 3 months ago
wonjune-kang / expressive-speech-retrieval
View on GitHub
Expressive Speech Retrieval using Natural Language Descriptions of Speaking Style
☆15Aug 18, 2025Updated 11 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
ddlBoJack / MMAR
View on GitHub
[NeurIPS 2025] Benchmark data and code for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
☆214Feb 25, 2026Updated 5 months ago
RicherMans / CED
View on GitHub
Source code for Consistent ensemble distillation for audio tagging
☆75Mar 20, 2026Updated 4 months ago
xiquan-li / MeanAudio
View on GitHub
[ACL 2026 Main] MeanAudio: Fast and Faithful Text-to-Audio Generation with Mean Flows
☆145Sep 2, 2025Updated 10 months ago
SonyResearch / VRVQ
View on GitHub
Variable Bitrate Residual Vector Quantization for Audio Coding
☆54May 1, 2025Updated last year
xiaomi-research / dasheng-lm
View on GitHub
Efficient audio understanding with general audio captions
☆429Apr 24, 2026Updated 3 months ago
k2-fsa / Flow2GAN
View on GitHub
Hybrid Flow Matching and GAN with Multi-Resolution Network for Few-Step High-Fidelity Audio Generation
☆146Mar 8, 2026Updated 4 months ago
zeyuxie29 / SemanticVocoder
View on GitHub
☆28Apr 6, 2026Updated 3 months ago
gzhu06 / Cacophony
View on GitHub
Inference codebase for "Cacophony: An Improved Contrastive Audio-Text Model". Preprint: https://arxiv.org/abs/2402.06986
☆49Jan 19, 2026Updated 6 months ago
ASLP-lab / FlashTTS
View on GitHub
Fast Streaming TTS with MTP Acceleration and X-pred Mean Flow Distillation
☆67Jun 16, 2026Updated last month
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
thuhcsi / SpeechCraft
View on GitHub
The official repository of SpeechCraft dataset, a large-scale expressive bilingual speech dataset with natural language descriptions.
☆198Feb 28, 2026Updated 5 months ago
FrontierLabs / F5R-TTS
View on GitHub
Official code for "F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization"
☆169Mar 3, 2026Updated 4 months ago
bovod-sjtu / HoliTok
View on GitHub
HoliTok:A Coutinuous Holistic Tokenization with Robust Dual Capabilities of Speech Generation and Understanding
☆39Jun 8, 2026Updated last month
KeiKinn / ParaCLAP
View on GitHub
Towards a general language-audio model for computational paralinguistic tasks
☆31Dec 14, 2024Updated last year
ZhikangNiu / Semantic-VAE
View on GitHub
[INTERSPEECH 2026 Oral]Official code for "Semantic-VAE: Semantic-Alignment Latent Representation for Better Speech Synthesis"
☆121Jun 21, 2026Updated last month
gwh22 / LAFMA
View on GitHub
LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation (INTERSPEECH 2024)
☆44Jun 13, 2024Updated 2 years ago
SWivid / AUV
View on GitHub
An All-in-One Speech, Sound, Music Codec with Single Nested Codebook
☆28Oct 11, 2025Updated 9 months ago
Ming-er / MGA-CLAP
View on GitHub
official implementation of MGA-CLAP (ACM MM 2024)
☆29Oct 25, 2024Updated last year
kehanlu / DeSTA2.5-Audio
View on GitHub
Code for DeSTA2.5-Audio, general-purpose LALM
☆141Feb 4, 2026Updated 5 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
ZhikangNiu / A-DMA
View on GitHub
[INTERSPEECH 2025 Oral]Official code for "Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment"
☆67Jun 16, 2025Updated last year
xiquan-li / Awesome-Audio-Generation
View on GitHub
Curated list for papers, codes and resources related to Text-to-Audio (TTA) Generation
☆75Jul 20, 2026Updated last week
zeyuxie29 / PicoAudio
View on GitHub
☆45Jan 13, 2025Updated last year
jimbozhang / xares
View on GitHub
A benchmark for evaluating audio encoders on various audio tasks.
☆55Apr 27, 2026Updated 3 months ago
Jiang-Yidi / UniCodec
View on GitHub
[ACL 2025 Main] UniCodec: a unified audio codec with a single codebook to support multi-domain audio data, including speech, music, and s…
☆157May 30, 2025Updated last year
ETH-DISCO / sao-instruct
View on GitHub
Official repo for SAO-Instruct: Free-form Audio Editing using Natural Language Instructions presented at NeurIPS 2025
☆18Oct 28, 2025Updated 9 months ago
facebookresearch / audiobox-aesthetics
View on GitHub
Unified automatic quality assessment for speech, music, and sound.
☆747Jun 5, 2025Updated last year