This is a repository for fine-tuning Qwen2-Audio, currently supporting Distributed Data Parallel (DDP) and DeepSpeed.
☆51Jul 28, 2025Updated 10 months ago
Alternatives and similar repositories for Qwen2-Audio-finetune
Users that are interested in Qwen2-Audio-finetune are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The project is associated with the recently-launched INTERSPEECH 2025 Workshop on Multilingual Conversational Speech Language Model (MLC-…☆49May 14, 2025Updated last year
- Colab notebook for fine-tuning Qwen2-Audio with trl's SFT and PPO trainers.☆24Nov 23, 2024Updated last year
- UMETTS: A Unified Framework for Emotional Text-to-Speech Synthesis with Multimodal Prompts☆41Jun 12, 2025Updated 11 months ago
- Target speaker automatic speech recognition (TS-ASR)☆14Oct 14, 2023Updated 2 years ago
- The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.☆40Sep 8, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Code for the paper "JELLY: Joint Emotion Recognition and Context Reasoning with LLMs for Conversational Speech Synthesis"☆14Nov 5, 2024Updated last year
- ☆23Oct 17, 2024Updated last year
- ☆115Oct 21, 2025Updated 7 months ago
- SLT 2024 Mandarin Stuttering Event Detection and Automatic Speech Recognition Challenge☆12Jun 11, 2024Updated last year
- ☆15Apr 2, 2025Updated last year
- Llama-Mimi is a speech language model that uses a unified tokenizer (Mimi) and a single Transformer decoder (Llama) to jointly model sequ…☆31Sep 20, 2025Updated 8 months ago
- Prompting Large Language Models with Audio for General-Purpose Speech Summarization☆20May 14, 2025Updated last year
- [ICLR 2026] Data Pipeline, Models, and Benchmark for Omni-Captioner.☆134Apr 7, 2026Updated last month
- [NeurIPS 2025] Benchmark data and code for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix☆210Feb 25, 2026Updated 3 months ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- real-time speech enhance☆17Jan 23, 2024Updated 2 years ago
- 🤗 R1-AQA Model: mispeech/r1-aqa☆324Mar 28, 2025Updated last year
- [CVPR 2025] Official implementation of paper "Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie…☆23Jun 6, 2025Updated 11 months ago
- ☆11Oct 20, 2022Updated 3 years ago
- ☆15Apr 4, 2025Updated last year
- This repository documents Barry's journey in learning deep learning for speech processing. Here, you'll find scripts and code snippets re…☆13Oct 8, 2025Updated 7 months ago
- A Chinese Expressive Long-dialogue Speech Dataset with Scripts☆21Nov 11, 2024Updated last year
- Official implementation of the paper titled "Age and Gender Recognition Using a Convolutional Neural Network with a Specially Designed Mu…☆28Mar 5, 2024Updated 2 years ago
- Understanding and Tackling Hallucinations in Large Audio-Language Models | ICASSP 2025, Interspeech 2024☆34Mar 14, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Vox-Profile Benchmark☆76Feb 16, 2026Updated 3 months ago
- small audio language model for reasoning☆86Dec 4, 2025Updated 5 months ago
- The baselines of ARC-Challenge-Interspeech2026☆60Dec 1, 2025Updated 5 months ago
- MOSS-Speech is a true speech-to-speech large language model without text guidance.☆133Feb 13, 2026Updated 3 months ago
- Fine-tune the Whisper speech recognition model to support training without timestamp data, training with timestamp data, and training wit…☆1,215May 8, 2026Updated 3 weeks ago
- The baseline system for the ICASSP2024 ICMC-ASR Challenge.☆57Dec 6, 2023Updated 2 years ago
- The repoduction codes for Qwen-Audio Fine-tuning☆54Feb 28, 2026Updated 3 months ago
- LLaST: Improved End-to-end Speech Translation System Leveraged by Large Language Models☆26Aug 11, 2024Updated last year
- A Framework for Speech, Language, Audio, Music Processing with Large Language Model☆1,032Jan 15, 2026Updated 4 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Keyword Search Recipe for Subword ASR☆30Jul 12, 2019Updated 6 years ago
- Simplistic Implementation of Zipformer:A faster and better encoder for automatic speech recognition in PyTorch☆21Jun 3, 2024Updated last year
- SoTA open-source TTS☆26Jul 8, 2025Updated 10 months ago
- PolEval 2021 Task 1☆15Jun 28, 2022Updated 3 years ago
- CTC decoder with hotwords for ASR.☆36Apr 13, 2025Updated last year
- ☆37Jun 30, 2022Updated 3 years ago
- A neural speech codec based on discrete WavLM representations☆26Aug 28, 2024Updated last year