NKU-HLT/DIFFA

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/NKU-HLT/DIFFA)

NKU-HLT / DIFFA

[AAAI 2026 & ACL 2026] The official implementation of the DIFFA series for dLLM-based large audio language model

☆83

Alternatives and similar repositories for DIFFA

Users that are interested in DIFFA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

NKU-HLT / SpeechLLM-as-Judges
View on GitHub
[ACL 2026]
☆25Dec 6, 2025Updated 7 months ago
NKU-HLT / KNN-CTC
View on GitHub
[ICASSP 2024] KNN-CTC: Enhancing ASR via Retrieval of CTC Pseudo Labels
☆42Mar 20, 2024Updated 2 years ago
NKU-HLT / RAMP_MOS
View on GitHub
[IEEE TASLP] Retrieval-Augmented MOS Prediction with Prior Knowledge Integration
☆33Mar 23, 2025Updated last year
NKU-HLT / MusicEval-baseline
View on GitHub
☆12Apr 18, 2025Updated last year
ASLP-lab / M7-TTS
View on GitHub
M7-TTS: A Mini-Scale Multilingual and Multi-Dialect Text-to-Speech Language Model with Mimi codec and Multi Token Prediction
☆20Mar 19, 2026Updated 4 months ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
yangdongchao / ALMTokenizer
View on GitHub
The demo page for ALMTokenizer
☆59Apr 14, 2025Updated last year
lysanderism / TimeAudio
View on GitHub
The official repository TimeAudio, a comprehensive framework that incorporates fine-grained acoustic cues into LALMs with enhanced module…
☆30Nov 18, 2025Updated 8 months ago
y-ren16 / OV-InstructTTS
View on GitHub
☆22Jan 27, 2026Updated 6 months ago
roudimit / Omni-R1
View on GitHub
[ASRU 2025] Omni-R1: Do You Really Need Audio to Fine-Tune Your Audio LLM?
☆47Nov 21, 2025Updated 8 months ago
GLJS / AudioToolAgent
View on GitHub
GitHub repository for AudioToolAgent
☆20Feb 13, 2026Updated 5 months ago
NKU-HLT / Role-Play-Prompting
View on GitHub
[NAACL 2024] Better Zero-Shot Reasoning with Role-Play Prompting
☆36Nov 14, 2023Updated 2 years ago
xiaomi-research / dasheng-tokenizer
View on GitHub
State-of-the-art continious audio tokenization
☆40Mar 9, 2026Updated 4 months ago
Soul-AILab / SAC
View on GitHub
[ACL 2026 Main] Training, inference, and testing of the SAC speech codec model.
☆108Nov 1, 2025Updated 8 months ago
yanghaha0908 / WavCube
View on GitHub
Official code for "WavCube: Unifying Speech Representation for Understanding and Generation via Semantic-Acoustic Joint Modeling"
☆62Jun 27, 2026Updated last month
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
xiquan-li / FineLAP
View on GitHub
[ACL 2026 Main] FineLAP: Taming Heterogeneous Supervision for Fine-grained Language-Audio Pre-training
☆36Apr 20, 2026Updated 3 months ago
ASLP-lab / FastTurn
View on GitHub
☆35May 19, 2026Updated 2 months ago
NKU-HLT / Emotion-Recognition
View on GitHub
Paper List
☆18Jul 2, 2025Updated last year
xzf-thu / Voices-in-the-Wild-Bench
View on GitHub
☆28May 22, 2026Updated 2 months ago
zeyuxie29 / SemanticVocoder
View on GitHub
☆28Apr 6, 2026Updated 3 months ago
ASLP-lab / OmniCodec
View on GitHub
OmniCodec: Low Frame Rate Universal Audio Codec with Semantic–Acoustic Disentanglement
☆46Apr 17, 2026Updated 3 months ago
NKU-HLT / AudioEditor
View on GitHub
☆47Apr 2, 2025Updated last year
ZhikangNiu / Semantic-VAE
View on GitHub
[INTERSPEECH 2026 Oral]Official code for "Semantic-VAE: Semantic-Alignment Latent Representation for Better Speech Synthesis"
☆121Jun 21, 2026Updated last month
XiaomiMiMo / MiMo-Audio-Training
View on GitHub
☆109Oct 16, 2025Updated 9 months ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
wenet-e2e / west
View on GitHub
We Speech Toolkit, LLM based Speech Toolkit for Speech Understanding, Generation, and Interaction
☆206Jul 17, 2026Updated last week
inclusionAI / AudioMCQ
View on GitHub
[ICLR 2026] AudioMCQ: A 571k audio multiple-choice question dataset for post-training Large Audio Language Models with dual CoT annotatio…
☆51Apr 21, 2026Updated 3 months ago
xiquan-li / Resonate
View on GitHub
[INTERSPEECH 2026] Pre-training, SFT, DPO and GRPO for Text-to-Audio Generation
☆48Apr 17, 2026Updated 3 months ago
ddlBoJack / MMAR
View on GitHub
[NeurIPS 2025] Benchmark data and code for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
☆214Feb 25, 2026Updated 5 months ago
Bai-YT / ConsistencyTTA
View on GitHub
ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation
☆39Nov 20, 2024Updated last year
ictnlp / SLED-TTS
View on GitHub
Streamable Text-to-Speech model using a language modeling approach, without vector quantization
☆108May 20, 2025Updated last year
Ruiqi-Yan / Awesome-Full-Duplex-SDM
View on GitHub
A curated list of full-duplex spoken dialogue models & benchmarks
☆147Updated this week
Sakshi113 / MMAU
View on GitHub
☆156Feb 9, 2026Updated 5 months ago
ASLP-lab / OSUM-Pangu
View on GitHub
An Open-Source Multidimension Speech Understanding Foundation Model Built upon OpenPangu on Ascend NPUs
☆33Mar 15, 2026Updated 4 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
Aria-K-Alethia / BigCodec
View on GitHub
Official implementation of the paper "BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec"
☆218Sep 19, 2024Updated last year
vivian556123 / NeurIPS2024-CoVoMix
View on GitHub
Official repo for CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations
☆67Jan 16, 2025Updated last year
sonalkum / MMAUPro
View on GitHub
Official repo for MMAU-Pro Benchmark
☆22Sep 25, 2025Updated 10 months ago
facebookresearch / lst
View on GitHub
Code for Latent Speech-Text Transformer (LST)
☆35Mar 12, 2026Updated 4 months ago
kehanlu / DeSTA2.5-Audio
View on GitHub
Code for DeSTA2.5-Audio, general-purpose LALM
☆141Feb 4, 2026Updated 5 months ago
NKU-HLT / Fusion-Insider-threat-detection
View on GitHub
[ICANN 2023] Anomaly-Based Insider Threat Detection via Hierarchical Information Fusion
☆18Nov 20, 2023Updated 2 years ago
ASLP-lab / FMSU-Bench
View on GitHub
Towards Fine-Grained Multi-Dimensional Speech Understanding: Data Pipeline, Benchmark, and Model
☆25May 21, 2026Updated 2 months ago