Official GitHub repository for paper "SAKURA: On the Multi-hop Reasoning of Large Audio-Language Models Based on Speech and Audio Information" (Interspeech 2025)
☆22Aug 14, 2025Updated 6 months ago
Alternatives and similar repositories for SAKURA
Users that are interested in SAKURA are comparing it to the libraries listed below
Sorting:
- small audio language model for reasoning☆86Dec 4, 2025Updated 3 months ago
- ☆13Sep 25, 2024Updated last year
- Code for DeSTA2.5-Audio, general-purpose LALM☆128Feb 4, 2026Updated last month
- Understanding and Tackling Hallucinations in Large Audio-Language Models | ICASSP 2025, Interspeech 2024☆32Mar 14, 2025Updated 11 months ago
- Evaluation code for benchmarking VLMs in traditional chinese understanding☆13Dec 22, 2025Updated 2 months ago
- Toward Multi Modality Language Model - implementation of GPT-4o/Project Astra☆16Dec 10, 2024Updated last year
- SLT 2024 Mandarin Stuttering Event Detection and Automatic Speech Recognition Challenge☆12Jun 11, 2024Updated last year
- Textless (ASR-transcript free) Spoken Question Answering. The official release of NMSQA dataset and the implementation of "DUAL: Textless…☆35Aug 10, 2023Updated 2 years ago
- Official repository for the 1st DAFx Parameter Estimation Challenge☆35Feb 16, 2026Updated 2 weeks ago
- [EMNLP 2025 Findings] A complete cross-modal RAG system for end-to-end speech-to-speech large models, including ASR-based Retrieval and E…☆27Jul 11, 2025Updated 7 months ago
- A deepfake audio dataset for detecting fake speech from codec-based speech synthesis systems, Interspeech 2024☆20Jul 27, 2024Updated last year
- Code and model for ICASSP 2025 Paper "Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data"☆121Jul 15, 2025Updated 7 months ago
- Event Relation in Text-to-Audio (TTA) Generation☆20Feb 26, 2025Updated last year
- 《SpeechPrompt v2: Prompt Tuning for Speech Classification Tasks》Speech processing with prompting paradigm☆81Oct 19, 2023Updated 2 years ago
- The official repository of Dynamic-SUPERB.☆197Jun 24, 2025Updated 8 months ago
- AI作曲家☆20Apr 19, 2017Updated 8 years ago
- Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".☆469Apr 24, 2024Updated last year
- Code for the paper: GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities☆153Dec 5, 2024Updated last year
- Audio Codec Speech processing Universal PERformance Benchmark☆297Jan 8, 2026Updated last month
- Audio-FLAN☆160Sep 23, 2025Updated 5 months ago
- The first Large Audio Language Model that enables native in-depth thinking, which is trained on large-scale audio Chain-of-Thought data.☆284May 15, 2025Updated 9 months ago
- [NeurIPS 2025] Benchmark data and code for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix☆197Feb 25, 2026Updated last week
- ☆133Jan 24, 2026Updated last month
- Zero-Resource Speech Discovery, Search, and Evaluation Tools☆29Aug 6, 2015Updated 10 years ago
- ☆130Feb 9, 2026Updated 3 weeks ago
- AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension☆127Dec 9, 2024Updated last year
- ☆24Sep 10, 2025Updated 5 months ago
- Official implementation of the paper: "LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech"☆68Dec 13, 2021Updated 4 years ago
- Recursive Neural Tensor Networks☆11Feb 3, 2014Updated 12 years ago
- The dataset repo of "CLCIFAR: CIFAR-Derived Benchmark Datasets with Human Annotated Complementary Labels" paper☆16Aug 8, 2025Updated 6 months ago
- The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.☆38Sep 8, 2024Updated last year
- Dataset of dry/wet pairs for audio effects research☆37Apr 17, 2025Updated 10 months ago
- ☆41May 15, 2023Updated 2 years ago
- Comprehensive quantitative comparison of lossless and lossy audio codecs☆39Feb 11, 2023Updated 3 years ago
- An Audio Language model for Audio Tasks☆319Apr 19, 2024Updated last year
- An audio and transcribed corpus of contemporary Hong Kong Cantonese☆40Dec 30, 2020Updated 5 years ago
- Detecting and correction dysfluencies/stuttering/stammering in audio files☆10Apr 23, 2023Updated 2 years ago
- arxiv daily for speech translation, legal. Ref: Vincentqyw/cv-arxiv-daily☆15Jan 6, 2025Updated last year
- A python algorithm to change the pitch of the voice in real time☆13Dec 13, 2020Updated 5 years ago