kehanlu/DeSTA2.5-Audio

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/kehanlu/DeSTA2.5-Audio)

kehanlu / DeSTA2.5-Audio

Code for DeSTA2.5-Audio, general-purpose LALM

☆141

Alternatives and similar repositories for DeSTA2.5-Audio

Users that are interested in DeSTA2.5-Audio are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

kehanlu / DeSTA2
View on GitHub
Code and model for ICASSP 2025 Paper "Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data"
☆127Jul 15, 2025Updated last year
kehanlu / Speech-IFEval
View on GitHub
Leaderboard and code for "Speech-IFEval", Interspeech 2025
☆24May 27, 2025Updated last year
ckyang1124 / SAKURA
View on GitHub
Official GitHub repository for paper "SAKURA: On the Multi-hop Reasoning of Large Audio-Language Models Based on Speech and Audio Informa…
☆25Aug 14, 2025Updated 11 months ago
Alfred0622 / HypR
View on GitHub
A benchmark corpus for ASR hypothesis revising task
☆21Sep 26, 2023Updated 2 years ago
ckyang1124 / LALM-Evaluation-Survey
View on GitHub
Collection of works for evaluating (and analyzing) large audio-language models (LALMs)
☆41Aug 11, 2025Updated 11 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
mtkresearch / TASTE-SpokenLM
View on GitHub
A method that directly addresses the modality gap by aligning speech token with the corresponding text transcription during the tokenizat…
☆119Sep 3, 2025Updated 10 months ago
ddlBoJack / MMAR
View on GitHub
[NeurIPS 2025] Benchmark data and code for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
☆214Feb 25, 2026Updated 5 months ago
AmphionTeam / SpeechJudge
View on GitHub
SpeechJudge: Towards Human-Level Judgment for Speech Naturalness (https://arxiv.org/abs/2511.07931)
☆79Dec 23, 2025Updated 7 months ago
soham97 / mellow
View on GitHub
small audio language model for reasoning
☆88Dec 4, 2025Updated 7 months ago
ajd12342 / paraspeechcaps
View on GitHub
Codebase for 'Scaling Rich Style-Prompted Text-to-Speech Datasets'
☆165Mar 26, 2026Updated 4 months ago
ga642381 / Spoken-Dialogue-Model-Survey
View on GitHub
A survey of spoken dialogue models (SDMs) with speech input and speech output. Focus on their Intermediate Representation and Generation …
☆31Mar 24, 2026Updated 4 months ago
Sakshi113 / MMAU
View on GitHub
☆156Feb 9, 2026Updated 5 months ago
AudioLLMs / AudioBench
View on GitHub
AudioBench: A Universal Benchmark for Audio Large Language Models
☆319May 29, 2026Updated 2 months ago
the-bird-F / GLM-Voice-RAG
View on GitHub
[EMNLP 2025 Findings] A complete cross-modal RAG system for end-to-end speech-to-speech large models, including ASR-based Retrieval and E…
☆31Jul 11, 2025Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
xiaomi-research / dasheng-lm
View on GitHub
Efficient audio understanding with general audio captions
☆429Apr 24, 2026Updated 3 months ago
XiaoMi / dasheng
View on GitHub
Official PyTorch code for Deep Audio-Signal Holistic Embeddings
☆200Nov 7, 2025Updated 8 months ago
voidful / Codec-SUPERB
View on GitHub
Audio Codec Speech processing Universal PERformance Benchmark
☆308Jul 4, 2026Updated 3 weeks ago
Shy-98 / MELLE
View on GitHub
Unofficial PyTorch implementation of "Autoregressive Speech Synthesis without Vector Quantization (MELLE)"
☆41Jun 28, 2025Updated last year
Soul-AILab / SAC
View on GitHub
[ACL 2026 Main] Training, inference, and testing of the SAC speech codec model.
☆108Nov 1, 2025Updated 8 months ago
Honee-W / U-SAM
View on GitHub
Official repository for U-SAM (Interspeech 2025)
☆28Jun 3, 2025Updated last year
3loi / NaturalVoices
View on GitHub
☆61Oct 22, 2025Updated 9 months ago
voidful / llm-codec
View on GitHub
LLM-Codec: Neural Audio Codec Meets Language Model Objectives
☆23May 3, 2026Updated 2 months ago
MatthewCYM / VoiceBench
View on GitHub
[TACL'26] VoiceBench: Benchmarking LLM-Based Voice Assistants
☆378Jun 11, 2026Updated last month
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
thuhcsi / SpeechCraft
View on GitHub
The official repository of SpeechCraft dataset, a large-scale expressive bilingual speech dataset with natural language descriptions.
☆198Feb 28, 2026Updated 5 months ago
0nutation / USLM
View on GitHub
Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)
☆152Sep 14, 2023Updated 2 years ago
DanielLin94144 / Full-Duplex-Bench
View on GitHub
A Benchmark for Evaluating Turn-Taking and Overlap Handling in Full-Duplex Spoken Dialogue Models
☆245May 20, 2026Updated 2 months ago
xiaomi-research / dasheng-glap
View on GitHub
Official Implementation of GLAP - General Language Audio Pretraining
☆75May 14, 2026Updated 2 months ago
NKU-HLT / DIFFA
View on GitHub
[AAAI 2026 & ACL 2026] The official implementation of the DIFFA series for dLLM-based large audio language model
☆83Apr 7, 2026Updated 3 months ago
ddlBoJack / Omni-Captioner
View on GitHub
[ICLR 2026] Data Pipeline, Models, and Benchmark for Omni-Captioner.
☆142Apr 7, 2026Updated 3 months ago
X-LANCE / SLAM-LLM
View on GitHub
A Framework for Speech, Language, Audio, Music Processing with Large Language Model
☆1,050Jan 15, 2026Updated 6 months ago
ictnlp / SLED-TTS
View on GitHub
Streamable Text-to-Speech model using a language modeling approach, without vector quantization
☆108May 20, 2025Updated last year
dynamic-superb / dynamic-superb
View on GitHub
The official repository of Dynamic-SUPERB.
☆200Jun 24, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
anthony-wss / glm-4-voice-finetune
View on GitHub
☆14Apr 4, 2025Updated last year
yoongi43 / VRVQ
View on GitHub
Implementation of the paper "Variable Bitrate Residual Vector Quantization for Audio Coding"
☆11Apr 10, 2025Updated last year
SJTU-OmniAgent / VocalNet
View on GitHub
☆123May 18, 2026Updated 2 months ago
xiaomi-research / dasheng-audiogen
View on GitHub
end-to-end text to audio scene generation model
☆50Jun 16, 2026Updated last month
YuanGongND / ltu
View on GitHub
Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".
☆478Apr 24, 2024Updated 2 years ago
Ereboas / MagiCodec
View on GitHub
A single-layer, streaming codec model providing SOTA audio quality and discrete tokens designed for superior downstream modelability.
☆125Jun 4, 2025Updated last year
MiscellaneousStuff / PhoneLM
View on GitHub
(R&D) Text to speech using phonemes as inputs and audio codec codes as outputs. Loosely based on MegaByte, VALL-E and Encodec.
☆48Sep 4, 2023Updated 2 years ago