Official GitHub repository for paper "SAKURA: On the Multi-hop Reasoning of Large Audio-Language Models Based on Speech and Audio Information" (Interspeech 2025)
☆24Aug 14, 2025Updated 9 months ago
Alternatives and similar repositories for SAKURA
Users that are interested in SAKURA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Evaluation code for benchmarking VLMs in traditional chinese understanding☆14Dec 22, 2025Updated 5 months ago
- Code for DeSTA2.5-Audio, general-purpose LALM☆139Feb 4, 2026Updated 3 months ago
- ☆13Sep 25, 2024Updated last year
- Toward Multi Modality Language Model - implementation of GPT-4o/Project Astra☆16Dec 10, 2024Updated last year
- Textless (ASR-transcript free) Spoken Question Answering. The official release of NMSQA dataset and the implementation of "DUAL: Textless…☆35Aug 10, 2023Updated 2 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- small audio language model for reasoning☆86Dec 4, 2025Updated 5 months ago
- A deepfake audio dataset for detecting fake speech from codec-based speech synthesis systems, Interspeech 2024☆21Jul 27, 2024Updated last year
- Code and model for ICASSP 2025 Paper "Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data"☆126Jul 15, 2025Updated 10 months ago
- The dataset repo of "CLCIFAR: CIFAR-Derived Benchmark Datasets with Human Annotated Complementary Labels" paper☆16May 11, 2026Updated 2 weeks ago
- [EMNLP 2025 Findings] A complete cross-modal RAG system for end-to-end speech-to-speech large models, including ASR-based Retrieval and E…☆31Jul 11, 2025Updated 10 months ago
- SLT 2024 Mandarin Stuttering Event Detection and Automatic Speech Recognition Challenge☆12Jun 11, 2024Updated last year
- Audio Codec Speech processing Universal PERformance Benchmark☆305May 5, 2026Updated 2 weeks ago
- The official repository of Dynamic-SUPERB.☆200Jun 24, 2025Updated 11 months ago
- Understanding and Tackling Hallucinations in Large Audio-Language Models | ICASSP 2025, Interspeech 2024☆34Mar 14, 2025Updated last year
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".☆472Apr 24, 2024Updated 2 years ago
- The repo for reproducing the main results in TSMixer: An all-MLP Architecture for Time Series Forecasting.☆10Jun 15, 2023Updated 2 years ago
- 《SpeechPrompt v2: Prompt Tuning for Speech Classification Tasks》Speech processing with prompting paradigm☆81Oct 19, 2023Updated 2 years ago
- ☆15Apr 4, 2025Updated last year
- ☆41May 15, 2023Updated 3 years ago
- Code for the paper: GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities☆154Dec 5, 2024Updated last year
- AI作曲家☆20Apr 19, 2017Updated 9 years ago
- AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension☆131Dec 9, 2024Updated last year
- Audio-FLAN☆160Sep 23, 2025Updated 8 months ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Official PyTorch implementation of "LGViT: Dynamic Early Exiting for Accelerating Vision Transformer" (ACM MM 2023)☆16Nov 18, 2024Updated last year
- Comprehensive quantitative comparison of lossless and lossy audio codecs☆40Feb 11, 2023Updated 3 years ago
- This is a list of speech tasks and datasets, which can provide training data for Generative AI, AIGC, AI model training, intelligent spee…☆82Jun 7, 2024Updated last year
- [NeurIPS 2025] Benchmark data and code for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix☆208Feb 25, 2026Updated 2 months ago
- Event Relation in Text-to-Audio (TTA) Generation☆21Feb 26, 2025Updated last year
- ☆138Jan 24, 2026Updated 4 months ago
- The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.☆40Sep 8, 2024Updated last year
- NEAL (Nature+Energy Audio Labeller) is an open-source interactive audio data annotation tool.☆18Apr 7, 2025Updated last year
- Masked Modeling Duo: Towards a Universal Audio Pre-training Framework☆152Feb 23, 2026Updated 3 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Non-parallel voice conversion called ICRCycleGAN-VC based on CycleGAN and Inception-resNet module by Afiuny☆15Apr 15, 2026Updated last month
- ☆149Feb 9, 2026Updated 3 months ago
- ☆11Oct 20, 2022Updated 3 years ago
- uyghur text resource crawled from website☆12Dec 25, 2015Updated 10 years ago
- Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)☆151Sep 14, 2023Updated 2 years ago
- Lightweight python library for speaker diarization in real time implemented in pytorch☆11Oct 12, 2022Updated 3 years ago
- Lifelong Variational Autoencoder☆15Dec 6, 2017Updated 8 years ago