Official GitHub repository for paper "SAKURA: On the Multi-hop Reasoning of Large Audio-Language Models Based on Speech and Audio Information" (Interspeech 2025)
☆24Aug 14, 2025Updated 8 months ago
Alternatives and similar repositories for SAKURA
Users that are interested in SAKURA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Evaluation code for benchmarking VLMs in traditional chinese understanding☆14Dec 22, 2025Updated 3 months ago
- Code for DeSTA2.5-Audio, general-purpose LALM☆136Feb 4, 2026Updated 2 months ago
- ☆13Sep 25, 2024Updated last year
- Toward Multi Modality Language Model - implementation of GPT-4o/Project Astra☆16Dec 10, 2024Updated last year
- Textless (ASR-transcript free) Spoken Question Answering. The official release of NMSQA dataset and the implementation of "DUAL: Textless…☆35Aug 10, 2023Updated 2 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- small audio language model for reasoning☆85Dec 4, 2025Updated 4 months ago
- A deepfake audio dataset for detecting fake speech from codec-based speech synthesis systems, Interspeech 2024☆20Jul 27, 2024Updated last year
- Code and model for ICASSP 2025 Paper "Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data"☆124Jul 15, 2025Updated 8 months ago
- The dataset repo of "CLCIFAR: CIFAR-Derived Benchmark Datasets with Human Annotated Complementary Labels" paper☆16Aug 8, 2025Updated 8 months ago
- [EMNLP 2025 Findings] A complete cross-modal RAG system for end-to-end speech-to-speech large models, including ASR-based Retrieval and E…☆31Jul 11, 2025Updated 9 months ago
- SLT 2024 Mandarin Stuttering Event Detection and Automatic Speech Recognition Challenge☆12Jun 11, 2024Updated last year
- The official repository of Dynamic-SUPERB.☆200Jun 24, 2025Updated 9 months ago
- Understanding and Tackling Hallucinations in Large Audio-Language Models | ICASSP 2025, Interspeech 2024☆34Mar 14, 2025Updated last year
- Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".☆473Apr 24, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- The repo for reproducing the main results in TSMixer: An all-MLP Architecture for Time Series Forecasting.☆10Jun 15, 2023Updated 2 years ago
- 《SpeechPrompt v2: Prompt Tuning for Speech Classification Tasks》Speech processing with prompting paradigm☆81Oct 19, 2023Updated 2 years ago
- ☆15Apr 4, 2025Updated last year
- Official implementation of the paper "Pretraining Language Models to Ponder in Continuous Space"☆26Jul 21, 2025Updated 8 months ago
- Code for the paper: GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities☆154Dec 5, 2024Updated last year
- ☆41May 15, 2023Updated 2 years ago
- AI作曲家☆20Apr 19, 2017Updated 8 years ago
- Audio-FLAN☆159Sep 23, 2025Updated 6 months ago
- AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension☆126Dec 9, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Official PyTorch implementation of "LGViT: Dynamic Early Exiting for Accelerating Vision Transformer" (ACM MM 2023)☆16Nov 18, 2024Updated last year
- Comprehensive quantitative comparison of lossless and lossy audio codecs☆40Feb 11, 2023Updated 3 years ago
- This is a list of speech tasks and datasets, which can provide training data for Generative AI, AIGC, AI model training, intelligent spee…☆82Jun 7, 2024Updated last year
- [NeurIPS 2025] Benchmark data and code for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix☆202Feb 25, 2026Updated last month
- The first Large Audio Language Model that enables native in-depth thinking, which is trained on large-scale audio Chain-of-Thought data.☆291May 15, 2025Updated 10 months ago
- Event Relation in Text-to-Audio (TTA) Generation☆20Feb 26, 2025Updated last year
- The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.☆39Sep 8, 2024Updated last year
- ☆136Jan 24, 2026Updated 2 months ago
- NEAL (Nature+Energy Audio Labeller) is an open-source interactive audio data annotation tool.☆18Apr 7, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Masked Modeling Duo: Towards a Universal Audio Pre-training Framework☆146Feb 23, 2026Updated last month
- ☆141Feb 9, 2026Updated 2 months ago
- Non-parallel voice conversion called ICRCycleGAN-VC based on CycleGAN and Inception-resNet module by Afiuny☆15Oct 30, 2025Updated 5 months ago
- ☆11Oct 20, 2022Updated 3 years ago
- uyghur text resource crawled from website☆12Dec 25, 2015Updated 10 years ago
- Unified Speech Language Model for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"(ICLR 2024)☆151Sep 14, 2023Updated 2 years ago
- Lightweight python library for speaker diarization in real time implemented in pytorch☆11Oct 12, 2022Updated 3 years ago