This is a repository for fine-tuning Qwen2-Audio, currently supporting Distributed Data Parallel (DDP) and DeepSpeed.
☆52Jul 28, 2025Updated 8 months ago
Alternatives and similar repositories for Qwen2-Audio-finetune
Users that are interested in Qwen2-Audio-finetune are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The project is associated with the recently-launched INTERSPEECH 2025 Workshop on Multilingual Conversational Speech Language Model (MLC-…☆50May 14, 2025Updated 11 months ago
- Colab notebook for fine-tuning Qwen2-Audio with trl's SFT and PPO trainers.☆24Nov 23, 2024Updated last year
- UMETTS: A Unified Framework for Emotional Text-to-Speech Synthesis with Multimodal Prompts☆41Jun 12, 2025Updated 10 months ago
- Target speaker automatic speech recognition (TS-ASR)☆13Oct 14, 2023Updated 2 years ago
- The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.☆39Sep 8, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Code for the paper "JELLY: Joint Emotion Recognition and Context Reasoning with LLMs for Conversational Speech Synthesis"☆14Nov 5, 2024Updated last year
- ☆114Oct 21, 2025Updated 5 months ago
- ☆23Oct 17, 2024Updated last year
- SLT 2024 Mandarin Stuttering Event Detection and Automatic Speech Recognition Challenge☆12Jun 11, 2024Updated last year
- ☆15Apr 2, 2025Updated last year
- Llama-Mimi is a speech language model that uses a unified tokenizer (Mimi) and a single Transformer decoder (Llama) to jointly model sequ…☆30Sep 20, 2025Updated 6 months ago
- Prompting Large Language Models with Audio for General-Purpose Speech Summarization☆20May 14, 2025Updated 11 months ago
- [ICLR 2026] Data Pipeline, Models, and Benchmark for Omni-Captioner.☆130Apr 7, 2026Updated last week
- [NeurIPS 2025] Benchmark data and code for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix☆202Feb 25, 2026Updated last month
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- 🤗 R1-AQA Model: mispeech/r1-aqa☆320Mar 28, 2025Updated last year
- real-time speech enhance☆17Jan 23, 2024Updated 2 years ago
- [CVPR 2025] Official implementation of paper "Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie…☆23Jun 6, 2025Updated 10 months ago
- Official implementation of the paper titled "Age and Gender Recognition Using a Convolutional Neural Network with a Specially Designed Mu…☆27Mar 5, 2024Updated 2 years ago
- ☆11Oct 20, 2022Updated 3 years ago
- ☆15Apr 4, 2025Updated last year
- This repository documents Barry's journey in learning deep learning for speech processing. Here, you'll find scripts and code snippets re…☆13Oct 8, 2025Updated 6 months ago
- A Chinese Expressive Long-dialogue Speech Dataset with Scripts☆21Nov 11, 2024Updated last year
- Understanding and Tackling Hallucinations in Large Audio-Language Models | ICASSP 2025, Interspeech 2024☆34Mar 14, 2025Updated last year
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Vox-Profile Benchmark☆75Feb 16, 2026Updated 2 months ago
- small audio language model for reasoning☆85Dec 4, 2025Updated 4 months ago
- The baselines of ARC-Challenge-Interspeech2026☆58Dec 1, 2025Updated 4 months ago
- MOSS-Speech is a true speech-to-speech large language model without text guidance.☆129Feb 13, 2026Updated 2 months ago
- The baseline system for the ICASSP2024 ICMC-ASR Challenge.☆55Dec 6, 2023Updated 2 years ago
- A pytoch lightning training implementation of SLAM-ASR☆11Nov 17, 2025Updated 5 months ago
- The repoduction codes for Qwen-Audio Fine-tuning☆53Feb 28, 2026Updated last month
- LLaST: Improved End-to-end Speech Translation System Leveraged by Large Language Models☆27Aug 11, 2024Updated last year
- A Framework for Speech, Language, Audio, Music Processing with Large Language Model☆1,020Jan 15, 2026Updated 3 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Keyword Search Recipe for Subword ASR☆30Jul 12, 2019Updated 6 years ago
- Simplistic Implementation of Zipformer:A faster and better encoder for automatic speech recognition in PyTorch☆20Jun 3, 2024Updated last year
- SoTA open-source TTS☆26Jul 8, 2025Updated 9 months ago
- PolEval 2021 Task 1☆15Jun 28, 2022Updated 3 years ago
- CTC decoder with hotwords for ASR.☆35Apr 13, 2025Updated last year
- ☆37Jun 30, 2022Updated 3 years ago
- A neural speech codec based on discrete WavLM representations☆26Aug 28, 2024Updated last year