This is a repository for fine-tuning Qwen2-Audio, currently supporting Distributed Data Parallel (DDP) and DeepSpeed.
☆51Jul 28, 2025Updated 8 months ago
Alternatives and similar repositories for Qwen2-Audio-finetune
Users that are interested in Qwen2-Audio-finetune are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The project is associated with the recently-launched INTERSPEECH 2025 Workshop on Multilingual Conversational Speech Language Model (MLC-…☆50May 14, 2025Updated 10 months ago
- Colab notebook for fine-tuning Qwen2-Audio with trl's SFT and PPO trainers.☆24Nov 23, 2024Updated last year
- UMETTS: A Unified Framework for Emotional Text-to-Speech Synthesis with Multimodal Prompts☆42Jun 12, 2025Updated 9 months ago
- Target speaker automatic speech recognition (TS-ASR)☆12Oct 14, 2023Updated 2 years ago
- The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.☆38Sep 8, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Code for the paper "JELLY: Joint Emotion Recognition and Context Reasoning with LLMs for Conversational Speech Synthesis"☆14Nov 5, 2024Updated last year
- ☆114Oct 21, 2025Updated 5 months ago
- ☆23Oct 17, 2024Updated last year
- SLT 2024 Mandarin Stuttering Event Detection and Automatic Speech Recognition Challenge☆12Jun 11, 2024Updated last year
- Llama-Mimi is a speech language model that uses a unified tokenizer (Mimi) and a single Transformer decoder (Llama) to jointly model sequ…☆30Sep 20, 2025Updated 6 months ago
- ☆15Apr 2, 2025Updated 11 months ago
- Prompting Large Language Models with Audio for General-Purpose Speech Summarization☆20May 14, 2025Updated 10 months ago
- [ICLR 2026] Data Pipeline, Models, and Benchmark for Omni-Captioner.☆124Mar 18, 2026Updated last week
- [NeurIPS 2025] Benchmark data and code for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix☆200Feb 25, 2026Updated last month
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- 🤗 R1-AQA Model: mispeech/r1-aqa☆318Mar 28, 2025Updated last year
- real-time speech enhance☆17Jan 23, 2024Updated 2 years ago
- [CVPR 2025] Official implementation of paper "Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie…☆23Jun 6, 2025Updated 9 months ago
- Official implementation of the paper titled "Age and Gender Recognition Using a Convolutional Neural Network with a Specially Designed Mu…☆27Mar 5, 2024Updated 2 years ago
- ☆15Apr 4, 2025Updated 11 months ago
- ☆11Oct 20, 2022Updated 3 years ago
- This repository documents Barry's journey in learning deep learning for speech processing. Here, you'll find scripts and code snippets re…☆13Oct 8, 2025Updated 5 months ago
- Understanding and Tackling Hallucinations in Large Audio-Language Models | ICASSP 2025, Interspeech 2024☆32Mar 14, 2025Updated last year
- A Chinese Expressive Long-dialogue Speech Dataset with Scripts☆21Nov 11, 2024Updated last year
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Vox-Profile Benchmark☆74Feb 16, 2026Updated last month
- small audio language model for reasoning☆85Dec 4, 2025Updated 3 months ago
- MOSS-Speech is a true speech-to-speech large language model without text guidance.☆127Feb 13, 2026Updated last month
- The baselines of ARC-Challenge-Interspeech2026☆57Dec 1, 2025Updated 3 months ago
- Fine-tune the Whisper speech recognition model to support training without timestamp data, training with timestamp data, and training wit…☆1,204Dec 17, 2025Updated 3 months ago
- The baseline system for the ICASSP2024 ICMC-ASR Challenge.☆55Dec 6, 2023Updated 2 years ago
- The repoduction codes for Qwen-Audio Fine-tuning☆53Feb 28, 2026Updated last month
- A Framework for Speech, Language, Audio, Music Processing with Large Language Model☆1,013Jan 15, 2026Updated 2 months ago
- LLaST: Improved End-to-end Speech Translation System Leveraged by Large Language Models☆27Aug 11, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Keyword Search Recipe for Subword ASR☆30Jul 12, 2019Updated 6 years ago
- Simplistic Implementation of Zipformer:A faster and better encoder for automatic speech recognition in PyTorch☆19Jun 3, 2024Updated last year
- SoTA open-source TTS☆26Jul 8, 2025Updated 8 months ago
- This repository contains the baseline system for CHiME-8 MMCSG challenge focusing on transcribing both sides of a conversation where one …☆40Mar 13, 2024Updated 2 years ago
- PolEval 2021 Task 1☆15Jun 28, 2022Updated 3 years ago
- CTC decoder with hotwords for ASR.☆35Apr 13, 2025Updated 11 months ago
- ☆37Jun 30, 2022Updated 3 years ago