kehanlu / DeSTA2.5-AudioView external linksLinks
Code for DeSTA2.5-Audio, general-purpose LALM
☆128Feb 4, 2026Updated last week
Alternatives and similar repositories for DeSTA2.5-Audio
Users that are interested in DeSTA2.5-Audio are comparing it to the libraries listed below
Sorting:
- Code and model for ICASSP 2025 Paper "Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data"☆120Jul 15, 2025Updated 6 months ago
- (R&D) Text to speech using phonemes as inputs and audio codec codes as outputs. Loosely based on MegaByte, VALL-E and Encodec.☆48Sep 4, 2023Updated 2 years ago
- Official GitHub repository for paper "SAKURA: On the Multi-hop Reasoning of Large Audio-Language Models Based on Speech and Audio Informa…☆21Aug 14, 2025Updated 5 months ago
- ESLTTS dataset☆16Feb 6, 2025Updated last year
- [EMNLP 2025 Findings] A complete cross-modal RAG system for end-to-end speech-to-speech large models, including ASR-based Retrieval and E…☆27Jul 11, 2025Updated 7 months ago
- Official Implementation of GLAP - General Language Audio Pretraining☆61Jan 5, 2026Updated last month
- small audio language model for reasoning☆86Dec 4, 2025Updated 2 months ago
- A method that directly addresses the modality gap by aligning speech token with the corresponding text transcription during the tokenizat…☆111Sep 3, 2025Updated 5 months ago
- A Benchmark for Evaluating Turn-Taking and Overlap Handling in Full-Duplex Spoken Dialogue Models☆124Sep 21, 2025Updated 4 months ago
- Collection of works for evaluating (and analyzing) large audio-language models (LALMs)☆40Aug 11, 2025Updated 6 months ago
- Toward Multi Modality Language Model - implementation of GPT-4o/Project Astra☆16Dec 10, 2024Updated last year
- The official repository of Dynamic-SUPERB.☆197Jun 24, 2025Updated 7 months ago
- This repo contains the official PyTorch implementation of "Analyzing Discrete Self Supervised Speech Representation For Spoken Language M…☆20Jan 3, 2023Updated 3 years ago
- This repository contains the code for the paper "voc2vec: A Foundation Model for Non-Verbal Vocalization", accepted at ICASSP 2025.☆47Apr 14, 2025Updated 9 months ago
- Codebase for 'Scaling Rich Style-Prompted Text-to-Speech Datasets'☆153Mar 24, 2025Updated 10 months ago
- Audio-FLAN☆160Sep 23, 2025Updated 4 months ago
- Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)☆152Sep 14, 2023Updated 2 years ago
- [NeurIPS 2025] Benchmark data and code for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix☆195Dec 13, 2025Updated 2 months ago
- Audio Codec Speech processing Universal PERformance Benchmark☆296Jan 8, 2026Updated last month
- 《SpeechGen: Unlocking the Generative Power of Speech Language Models with Prompts》☆77Jun 9, 2023Updated 2 years ago
- The accompanying code for "Exploring the limits of decoder-only models trained on public speech recognition corpora" (Ankit Gupta, George…☆20Oct 11, 2024Updated last year
- Leaderboard and code for "Speech-IFEval", Interspeech 2025☆24May 27, 2025Updated 8 months ago
- A benchmark corpus for ASR hypothesis revising task☆21Sep 26, 2023Updated 2 years ago
- (Interspeech 2023 & ICASSP 2024) Official repository for ARMHuBERT and STaRHuBERT☆40Aug 29, 2024Updated last year
- Official implementation of "Unsupervised Pre-training for Data-Efficient Text-to-Speech on Low Resource Languages", ICASSP 2023☆27Apr 27, 2023Updated 2 years ago
- [EMNLP 2024] ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers☆125Mar 20, 2025Updated 10 months ago
- Code for T5lephone: Bridging Speech and Text Self-supervised Models for Spoken Language Understanding via Phoneme level T5☆19Nov 29, 2022Updated 3 years ago
- ☆43Sep 3, 2025Updated 5 months ago
- Code for vec2wav 2.0, a speech token vocoder for VC. Paper: https://arxiv.org/abs/2409.01995☆78Dec 3, 2024Updated last year
- 《SpeechPrompt v2: Prompt Tuning for Speech Classification Tasks》Speech processing with prompting paradigm☆82Oct 19, 2023Updated 2 years ago
- This repository contains prompts & best practices to annotate audio clips with a very high degree of details using Audio-Language-Models☆35Oct 13, 2024Updated last year
- AudioBench: A Universal Benchmark for Audio Large Language Models☆295Jun 17, 2025Updated 7 months ago
- An AR+AR TTS attempt.☆18Jan 13, 2025Updated last year
- Ultra-low-bitrate Speech Codec for Speech Language Modeling Applications☆86Dec 20, 2024Updated last year
- ☆19Mar 22, 2024Updated last year
- EMO-SUPERB submission☆50Oct 13, 2025Updated 4 months ago
- unofficial pytorch implementation of HiFi-GAN with fast MISR.☆15Mar 21, 2023Updated 2 years ago
- ☆31Jul 13, 2023Updated 2 years ago
- Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".☆469Apr 24, 2024Updated last year