ckyang1124/LALM-Evaluation-Survey

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ckyang1124/LALM-Evaluation-Survey)

ckyang1124 / LALM-Evaluation-Survey

Collection of works for evaluating (and analyzing) large audio-language models (LALMs)

☆41

Alternatives and similar repositories for LALM-Evaluation-Survey

Users that are interested in LALM-Evaluation-Survey are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ckyang1124 / SAKURA
View on GitHub
Official GitHub repository for paper "SAKURA: On the Multi-hop Reasoning of Large Audio-Language Models Based on Speech and Audio Informa…
☆25Aug 14, 2025Updated 11 months ago
kehanlu / DeSTA2.5-Audio
View on GitHub
Code for DeSTA2.5-Audio, general-purpose LALM
☆141Feb 4, 2026Updated 5 months ago
soham97 / ADIFF
View on GitHub
Explaining audio differences using language
☆16Feb 11, 2025Updated last year
AmphionTeam / SD-Eval
View on GitHub
[NeurIPS 2024] SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words
☆57Jun 25, 2024Updated 2 years ago
kehanlu / DeSTA2
View on GitHub
Code and model for ICASSP 2025 Paper "Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data"
☆127Jul 15, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
OFA-Sys / AIR-Bench
View on GitHub
AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension
☆133Dec 9, 2024Updated last year
ddlBoJack / MT4SSL
View on GitHub
[INTERSPEECH 2023 Best Paper Shortlist] Official implementation for MT4SSL: Boosting Self-Supervised Speech Representation Learning by In…
☆45Mar 25, 2024Updated 2 years ago
Sakshi113 / MMAU
View on GitHub
☆156Feb 9, 2026Updated 5 months ago
roger-tseng / av-superb
View on GitHub
A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models (ICASSP 2024)
☆58Apr 17, 2024Updated 2 years ago
ga642381 / Spoken-Dialogue-Model-Survey
View on GitHub
A survey of spoken dialogue models (SDMs) with speech input and speech output. Focus on their Intermediate Representation and Generation …
☆31Mar 24, 2026Updated 4 months ago
dynamic-superb / dynamic-superb
View on GitHub
The official repository of Dynamic-SUPERB.
☆200Jun 24, 2025Updated last year
yzyouzhang / Audio_Research_in_US
View on GitHub
Audio Research in US. US-based professors who work on audio (music, speech, acoustics). For students who would like to apply for RA, PhD,…
☆27Feb 27, 2026Updated 5 months ago
nervjack2 / Speech2Unit
View on GitHub
☆13Sep 25, 2024Updated last year
kehanlu / Speech-IFEval
View on GitHub
Leaderboard and code for "Speech-IFEval", Interspeech 2025
☆24May 27, 2025Updated last year
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
microsoft / NoAudioCaptioning
View on GitHub
Repository for "Training Audio Captioning Models without Audio"
☆10Sep 26, 2023Updated 2 years ago
microsoft / AudioEntailment
View on GitHub
Audio Entailment: Deductive Reasoning for Audio Understanding
☆17Dec 10, 2024Updated last year
ddlBoJack / MMAR
View on GitHub
[NeurIPS 2025] Benchmark data and code for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
☆214Feb 25, 2026Updated 5 months ago
B06901052 / DeepSpeed
View on GitHub
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
☆13Oct 11, 2022Updated 3 years ago
ga642381 / SpeechGen
View on GitHub
《SpeechGen: Unlocking the Generative Power of Speech Language Models with Prompts》
☆77Jun 9, 2023Updated 3 years ago
ga642381 / SpeechPrompt-v2
View on GitHub
《SpeechPrompt v2: Prompt Tuning for Speech Classification Tasks》Speech processing with prompting paradigm
☆81Oct 19, 2023Updated 2 years ago
voidful / llm-codec
View on GitHub
LLM-Codec: Neural Audio Codec Meets Language Model Objectives
☆23May 3, 2026Updated 2 months ago
soham97 / mellow
View on GitHub
small audio language model for reasoning
☆88Dec 4, 2025Updated 7 months ago
DanielLin94144 / Full-Duplex-Bench
View on GitHub
A Benchmark for Evaluating Turn-Taking and Overlap Handling in Full-Duplex Spoken Dialogue Models
☆245May 20, 2026Updated 2 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
nii-yamagishilab / SpeechSPC-mini
View on GitHub
Speech Security and Privacy Compendium - Mini
☆10Jun 18, 2024Updated 2 years ago
unilight / sheet
View on GitHub
Speech Human Evaluation Estimation Toolkit (SHEET)
☆138Mar 31, 2026Updated 3 months ago
ag027592 / EMO-SUPERB
View on GitHub
EMO-SUPERB: a reproducible speech emotion recognition benchmark with leakage-free splits for 6 datasets and 15 speech SSL models (IEEE SL…
☆51Updated this week
YuanGongND / ltu
View on GitHub
Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".
☆478Apr 24, 2024Updated 2 years ago
roger-tseng / CodecFake
View on GitHub
A deepfake audio dataset for detecting fake speech from codec-based speech synthesis systems, Interspeech 2024
☆22Jul 27, 2024Updated 2 years ago
DanielLin94144 / StyleTalk
View on GitHub
Official release of StyleTalk dataset.
☆75Jul 1, 2024Updated 2 years ago
Alfred0622 / HypR
View on GitHub
A benchmark corpus for ASR hypothesis revising task
☆21Sep 26, 2023Updated 2 years ago
jishengpeng / WavReward
View on GitHub
WavReward: Spoken Dialogue Models With Generalist Reward Evaluators
☆56May 15, 2025Updated last year
WangHelin1997 / Automatic_Speech_Annotator
View on GitHub
Automatic speech annotator processing speech with voice activaty detection, overlapping speech detection, speaker diarization and automat…
☆33Jun 14, 2024Updated 2 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
yzyouzhang / SASV_PR
View on GitHub
Official implementation of the Odyssey paper "A Probabilistic Fusion Framework for Spoofing Aware Speaker Verification"
☆18Jun 24, 2022Updated 4 years ago
AlanBaade / SyllableLM
View on GitHub
Official Code for SyllableLM: Learning Coarse Semantic Units for Speech Language Models
☆63Jul 1, 2025Updated last year
dreamtheater123 / VoxEval
View on GitHub
Github repository for ACL 2025 paper: VoxEval: Benchmarking the Knowledge Understanding Capabilities of End-to-End Spoken Language Models
☆24Jun 16, 2025Updated last year
yzGuu830 / efficient-speech-codec
View on GitHub
[EMNLP 2024] ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers
☆126Mar 20, 2025Updated last year
Splend1d / T5lephone
View on GitHub
Code for T5lephone: Bridging Speech and Text Self-supervised Models for Spoken Language Understanding via Phoneme level T5
☆19Nov 29, 2022Updated 3 years ago
Yaselley / deepfense-framework
View on GitHub
DeepFense: A Unified, Modular, and Extensible Framework for Robust Deepfake Audio Detection
☆27Jul 11, 2026Updated 2 weeks ago
mtkresearch / TASTE-SpokenLM
View on GitHub
A method that directly addresses the modality gap by aligning speech token with the corresponding text transcription during the tokenizat…
☆119Sep 3, 2025Updated 10 months ago