kehanlu/DeSTA2

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/kehanlu/DeSTA2)

kehanlu / DeSTA2

Code and model for ICASSP 2025 Paper "Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data"

☆127

Alternatives and similar repositories for DeSTA2

Users that are interested in DeSTA2 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

kehanlu / DeSTA2.5-Audio
View on GitHub
Code for DeSTA2.5-Audio, general-purpose LALM
☆140Feb 4, 2026Updated 5 months ago
kehanlu / Speech-IFEval
View on GitHub
Leaderboard and code for "Speech-IFEval", Interspeech 2025
☆24May 27, 2025Updated last year
Alfred0622 / HypR
View on GitHub
A benchmark corpus for ASR hypothesis revising task
☆21Sep 26, 2023Updated 2 years ago
ga642381 / Spoken-Dialogue-Model-Survey
View on GitHub
A survey of spoken dialogue models (SDMs) with speech input and speech output. Focus on their Intermediate Representation and Generation …
☆31Mar 24, 2026Updated 4 months ago
voidful / llm-codec
View on GitHub
LLM-Codec: Neural Audio Codec Meets Language Model Objectives
☆23May 3, 2026Updated 2 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
voidful / MMLM
View on GitHub
Toward Multi Modality Language Model - implementation of GPT-4o/Project Astra
☆16Dec 10, 2024Updated last year
voidful / asrp
View on GitHub
ASR text preprocessing utility
☆21Aug 5, 2024Updated last year
ckyang1124 / LALM-Evaluation-Survey
View on GitHub
Collection of works for evaluating (and analyzing) large audio-language models (LALMs)
☆41Aug 11, 2025Updated 11 months ago
kehanlu / python
View on GitHub
臺科大程式設計社 2019 spring
☆25May 28, 2019Updated 7 years ago
kehanlu / Mandarin-Wav2Vec2
View on GitHub
Pre-trained Wav2vec2.0 for Mandarin
☆43Oct 30, 2022Updated 3 years ago
the-bird-F / GLM-Voice-RAG
View on GitHub
[EMNLP 2025 Findings] A complete cross-modal RAG system for end-to-end speech-to-speech large models, including ASR-based Retrieval and E…
☆31Jul 11, 2025Updated last year
mtkresearch / TASTE-SpokenLM
View on GitHub
A method that directly addresses the modality gap by aligning speech token with the corresponding text transcription during the tokenizat…
☆119Sep 3, 2025Updated 10 months ago
kehanlu / server-monitor
View on GitHub
A light webserver for monitoring RAM and GPU usage on multiple servers.
☆21Mar 31, 2021Updated 5 years ago
kehanlu / University
View on GitHub
臺科併校小幫手 🍡
☆13Apr 21, 2023Updated 3 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
hhhaaahhhaa / ASR-TTA
View on GitHub
☆16Nov 4, 2025Updated 8 months ago
dynamic-superb / dynamic-superb
View on GitHub
The official repository of Dynamic-SUPERB.
☆200Jun 24, 2025Updated last year
nervjack2 / Speech2Unit
View on GitHub
☆13Sep 25, 2024Updated last year
ckyang1124 / SAKURA
View on GitHub
Official GitHub repository for paper "SAKURA: On the Multi-hop Reasoning of Large Audio-Language Models Based on Speech and Audio Informa…
☆25Aug 14, 2025Updated 11 months ago
Soul-AILab / SAC
View on GitHub
[ACL 2026 Main] Training, inference, and testing of the SAC speech codec model.
☆108Nov 1, 2025Updated 8 months ago
ga642381 / speech-trident
View on GitHub
Awesome speech/audio LLMs, representation learning, and codec models
☆1,240Jul 10, 2026Updated 2 weeks ago
kuan2jiu99 / audio-hallucination
View on GitHub
Understanding and Tackling Hallucinations in Large Audio-Language Models | ICASSP 2025, Interspeech 2024
☆34Mar 14, 2025Updated last year
yistLin / beamertheme-yeast
View on GitHub
Yeast, a lite and light beamer theme
☆18Dec 6, 2020Updated 5 years ago
thuhcsi / Contextual-Biasing-Dataset
View on GitHub
open-source Mandarian biased word dataset
☆14Sep 21, 2023Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Ereboas / MagiCodec
View on GitHub
A single-layer, streaming codec model providing SOTA audio quality and discrete tokens designed for superior downstream modelability.
☆125Jun 4, 2025Updated last year
yzGuu830 / efficient-speech-codec
View on GitHub
[EMNLP 2024] ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers
☆126Mar 20, 2025Updated last year
B06901052 / DeepSpeed
View on GitHub
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
☆13Oct 11, 2022Updated 3 years ago
DanielLin94144 / Full-Duplex-Bench
View on GitHub
A Benchmark for Evaluating Turn-Taking and Overlap Handling in Full-Duplex Spoken Dialogue Models
☆240May 20, 2026Updated 2 months ago
d223302 / A-Closer-Look-To-LLM-Evaluation
View on GitHub
Code for EMNLP 2023 findings paper "A Closer Look into Using Large Language Models for Automatic Evaluation"
☆19Oct 9, 2023Updated 2 years ago
ajd12342 / paraspeechcaps
View on GitHub
Codebase for 'Scaling Rich Style-Prompted Text-to-Speech Datasets'
☆163Mar 26, 2026Updated 3 months ago
rithiksachdev / PostASR-Correction-SLT2024
View on GitHub
☆18Jul 22, 2024Updated 2 years ago
dcaulley / av_diarization
View on GitHub
AudioVisual Diarization - Supervised and Unsupervised
☆15Nov 22, 2022Updated 3 years ago
ga642381 / FlappyBird
View on GitHub
Super Flappy Bird in p5.js
☆10Mar 8, 2021Updated 5 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
ex3ndr / supervoice-gpt-facodec
View on GitHub
GPT for FACodec
☆13Mar 25, 2024Updated 2 years ago
ddlBoJack / MMAR
View on GitHub
[NeurIPS 2025] Benchmark data and code for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
☆214Feb 25, 2026Updated 4 months ago
ZhikangNiu / Semantic-VAE
View on GitHub
[INTERSPEECH 2026 Oral]Official code for "Semantic-VAE: Semantic-Alignment Latent Representation for Better Speech Synthesis"
☆120Jun 21, 2026Updated last month
ga642381 / AudioCodec-Hub
View on GitHub
AudioCodec-Hub is a Python library for encoding and decoding audio data, supporting various neural audio codec models
☆25Sep 26, 2023Updated 2 years ago
ga642381 / SpeechPrompt-v2
View on GitHub
《SpeechPrompt v2: Prompt Tuning for Speech Classification Tasks》Speech processing with prompting paradigm
☆81Oct 19, 2023Updated 2 years ago
voidful / Codec-SUPERB
View on GitHub
Audio Codec Speech processing Universal PERformance Benchmark
☆308Jul 4, 2026Updated 3 weeks ago
Choddeok / Affectron
View on GitHub
[ACL 2026 Findings] Affectron: Emotional Speech Synthesis with Affective and Contextually Aligned Nonverbal Vocalizations
☆20Jul 16, 2026Updated last week