ckyang1124/SAKURA

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ckyang1124/SAKURA)

ckyang1124 / SAKURA

Official GitHub repository for paper "SAKURA: On the Multi-hop Reasoning of Large Audio-Language Models Based on Speech and Audio Information" (Interspeech 2025)

☆25

Alternatives and similar repositories for SAKURA

Users that are interested in SAKURA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

kehanlu / DeSTA2.5-Audio
View on GitHub
Code for DeSTA2.5-Audio, general-purpose LALM
☆140Feb 4, 2026Updated 5 months ago
ckyang1124 / LALM-Evaluation-Survey
View on GitHub
Collection of works for evaluating (and analyzing) large audio-language models (LALMs)
☆41Aug 11, 2025Updated 11 months ago
DanielLin94144 / DUAL-textless-SQA
View on GitHub
Textless (ASR-transcript free) Spoken Question Answering. The official release of NMSQA dataset and the implementation of "DUAL: Textless…
☆35Aug 10, 2023Updated 2 years ago
soham97 / mellow
View on GitHub
small audio language model for reasoning
☆88Dec 4, 2025Updated 7 months ago
roger-tseng / CodecFake
View on GitHub
A deepfake audio dataset for detecting fake speech from codec-based speech synthesis systems, Interspeech 2024
☆22Jul 27, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
TMMMU-Benchmark / evaluation
View on GitHub
Evaluation code for benchmarking VLMs in traditional chinese understanding
☆14Dec 22, 2025Updated 7 months ago
nervjack2 / Speech2Unit
View on GitHub
☆13Sep 25, 2024Updated last year
voidful / MMLM
View on GitHub
Toward Multi Modality Language Model - implementation of GPT-4o/Project Astra
☆16Dec 10, 2024Updated last year
dynamic-superb / dynamic-superb
View on GitHub
The official repository of Dynamic-SUPERB.
☆200Jun 24, 2025Updated last year
kehanlu / DeSTA2
View on GitHub
Code and model for ICASSP 2025 Paper "Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data"
☆127Jul 15, 2025Updated last year
roger-tseng / av-superb
View on GitHub
A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models (ICASSP 2024)
☆58Apr 17, 2024Updated 2 years ago
rithiksachdev / PostASR-Correction-SLT2024
View on GitHub
☆18Jul 22, 2024Updated 2 years ago
voidful / Codec-SUPERB
View on GitHub
Audio Codec Speech processing Universal PERformance Benchmark
☆308Jul 4, 2026Updated 3 weeks ago
L6-NLP / Generative-Annotation-NEC
View on GitHub
Generative_Annotation_NEC: A novel NEC method that utilizes speech sound features to retrieve candidate entities and a generative method …
☆17Dec 2, 2025Updated 7 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
Hypotheses-Paradise / UADF
View on GitHub
☆17May 5, 2024Updated 2 years ago
OFA-Sys / AIR-Bench
View on GitHub
AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension
☆133Dec 9, 2024Updated last year
Sakshi113 / MMAU
View on GitHub
☆156Feb 9, 2026Updated 5 months ago
ntucllab / CLImage_Dataset
View on GitHub
The dataset repo of "CLCIFAR: CIFAR-Derived Benchmark Datasets with Human Annotated Complementary Labels" paper
☆17May 11, 2026Updated 2 months ago
ga642381 / Spoken-Dialogue-Model-Survey
View on GitHub
A survey of spoken dialogue models (SDMs) with speech input and speech output. Focus on their Intermediate Representation and Generation …
☆31Mar 24, 2026Updated 4 months ago
hongfeixue / StutteringSpeechChallenge
View on GitHub
SLT 2024 Mandarin Stuttering Event Detection and Automatic Speech Recognition Challenge
☆12Jun 11, 2024Updated 2 years ago
ddlBoJack / MMAR
View on GitHub
[NeurIPS 2025] Benchmark data and code for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
☆214Feb 25, 2026Updated 4 months ago
kogby / CJ-Latex-Resume
View on GitHub
Latex Workshop (2024 Spring)
☆11Oct 20, 2024Updated last year
kuan2jiu99 / audio-hallucination
View on GitHub
Understanding and Tackling Hallucinations in Large Audio-Language Models | ICASSP 2025, Interspeech 2024
☆34Mar 14, 2025Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
YuanGongND / ltu
View on GitHub
Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".
☆478Apr 24, 2024Updated 2 years ago
shuaijiang / Ke-Omni-R
View on GitHub
Ke-Omni-R is an advanced audio reasoning model and achieved SOTA on MMAU
☆60Jun 11, 2025Updated last year
ga642381 / SpeechPrompt-v2
View on GitHub
《SpeechPrompt v2: Prompt Tuning for Speech Classification Tasks》Speech processing with prompting paradigm
☆81Oct 19, 2023Updated 2 years ago
kyegomez / USM
View on GitHub
Implementation of Google's USM speech model in Pytorch
☆35Updated this week
jack1yang / image-paragraph-captioning
View on GitHub
A Hierarchical Approach for Generating Descriptive Image Paragraphs
☆10Mar 27, 2020Updated 6 years ago
BogiHsu / Tacotron2-PyTorch
View on GitHub
Yet another PyTorch implementation of Tacotron 2 with reduction factor and faster training speed.
☆148Apr 12, 2022Updated 4 years ago
nu-dialogue / moshi-finetune
View on GitHub
Fine-tuning Moshi/J-Moshi on your own spoken dialogue data
☆101Jan 5, 2026Updated 6 months ago
anthony-wss / glm-4-voice-finetune
View on GitHub
☆14Apr 4, 2025Updated last year
Splend1d / BreezyVoice
View on GitHub
☆10Feb 16, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
voidful / llm-codec
View on GitHub
LLM-Codec: Neural Audio Codec Meets Language Model Objectives
☆23May 3, 2026Updated 2 months ago
krafton-ai / Raon-Speech
View on GitHub
Open-source speech AI models from KRAFTON, including Raon-Speech and Raon-SpeechChat for speech understanding, generation, and real-time …
☆72Apr 7, 2026Updated 3 months ago
voidful / vall-e-encodec
View on GitHub
☆41May 15, 2023Updated 3 years ago
Sreyan88 / GAMA
View on GitHub
Code for the paper: GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities
☆153Dec 5, 2024Updated last year
anthony-wss / tsmixer-reproduce
View on GitHub
The repo for reproducing the main results in TSMixer: An all-MLP Architecture for Time Series Forecasting.
☆11Jun 15, 2023Updated 3 years ago
JusperLee / AudioTrust
View on GitHub
AudioTrust: Benchmarking the Multi-faceted Trustworthiness of Audio Large Language Models
☆215Jan 28, 2026Updated 5 months ago
Honee-W / U-SAM
View on GitHub
Official repository for U-SAM (Interspeech 2025)
☆28Jun 3, 2025Updated last year