This is a repository for fine-tuning Qwen2-Audio, currently supporting Distributed Data Parallel (DDP) and DeepSpeed.
☆52Jul 28, 2025Updated 9 months ago
Alternatives and similar repositories for Qwen2-Audio-finetune
Users that are interested in Qwen2-Audio-finetune are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The project is associated with the recently-launched INTERSPEECH 2025 Workshop on Multilingual Conversational Speech Language Model (MLC-…☆50May 14, 2025Updated 11 months ago
- Colab notebook for fine-tuning Qwen2-Audio with trl's SFT and PPO trainers.☆24Nov 23, 2024Updated last year
- UMETTS: A Unified Framework for Emotional Text-to-Speech Synthesis with Multimodal Prompts☆41Jun 12, 2025Updated 10 months ago
- Target speaker automatic speech recognition (TS-ASR)☆13Oct 14, 2023Updated 2 years ago
- The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.☆40Sep 8, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Code for the paper "JELLY: Joint Emotion Recognition and Context Reasoning with LLMs for Conversational Speech Synthesis"☆14Nov 5, 2024Updated last year
- ☆23Oct 17, 2024Updated last year
- ☆115Oct 21, 2025Updated 6 months ago
- SLT 2024 Mandarin Stuttering Event Detection and Automatic Speech Recognition Challenge☆12Jun 11, 2024Updated last year
- ☆15Apr 2, 2025Updated last year
- Llama-Mimi is a speech language model that uses a unified tokenizer (Mimi) and a single Transformer decoder (Llama) to jointly model sequ…☆31Sep 20, 2025Updated 7 months ago
- Prompting Large Language Models with Audio for General-Purpose Speech Summarization☆20May 14, 2025Updated 11 months ago
- [ICLR 2026] Data Pipeline, Models, and Benchmark for Omni-Captioner.☆134Apr 7, 2026Updated last month
- [NeurIPS 2025] Benchmark data and code for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix☆207Feb 25, 2026Updated 2 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- real-time speech enhance☆17Jan 23, 2024Updated 2 years ago
- 🤗 R1-AQA Model: mispeech/r1-aqa☆323Mar 28, 2025Updated last year
- [CVPR 2025] Official implementation of paper "Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie…☆23Jun 6, 2025Updated 11 months ago
- Official implementation of the paper titled "Age and Gender Recognition Using a Convolutional Neural Network with a Specially Designed Mu…☆27Mar 5, 2024Updated 2 years ago
- ☆11Oct 20, 2022Updated 3 years ago
- ☆15Apr 4, 2025Updated last year
- This repository documents Barry's journey in learning deep learning for speech processing. Here, you'll find scripts and code snippets re…☆13Oct 8, 2025Updated 7 months ago
- A Chinese Expressive Long-dialogue Speech Dataset with Scripts☆21Nov 11, 2024Updated last year
- Understanding and Tackling Hallucinations in Large Audio-Language Models | ICASSP 2025, Interspeech 2024☆35Mar 14, 2025Updated last year
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Vox-Profile Benchmark☆76Feb 16, 2026Updated 2 months ago
- small audio language model for reasoning☆85Dec 4, 2025Updated 5 months ago
- The baselines of ARC-Challenge-Interspeech2026☆59Dec 1, 2025Updated 5 months ago
- MOSS-Speech is a true speech-to-speech large language model without text guidance.☆131Feb 13, 2026Updated 2 months ago
- Fine-tune the Whisper speech recognition model to support training without timestamp data, training with timestamp data, and training wit…☆1,211Dec 17, 2025Updated 4 months ago
- The baseline system for the ICASSP2024 ICMC-ASR Challenge.☆56Dec 6, 2023Updated 2 years ago
- A pytoch lightning training implementation of SLAM-ASR☆11Nov 17, 2025Updated 5 months ago
- The repoduction codes for Qwen-Audio Fine-tuning☆53Feb 28, 2026Updated 2 months ago
- LLaST: Improved End-to-end Speech Translation System Leveraged by Large Language Models☆26Aug 11, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- A Framework for Speech, Language, Audio, Music Processing with Large Language Model☆1,029Jan 15, 2026Updated 3 months ago
- Keyword Search Recipe for Subword ASR☆30Jul 12, 2019Updated 6 years ago
- Simplistic Implementation of Zipformer:A faster and better encoder for automatic speech recognition in PyTorch☆20Jun 3, 2024Updated last year
- SoTA open-source TTS☆26Jul 8, 2025Updated 10 months ago
- PolEval 2021 Task 1☆15Jun 28, 2022Updated 3 years ago
- CTC decoder with hotwords for ASR.☆35Apr 13, 2025Updated last year
- ☆37Jun 30, 2022Updated 3 years ago