teamtee/Qwen2-Audio-finetune

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/teamtee/Qwen2-Audio-finetune)

teamtee / Qwen2-Audio-finetune

This is a repository for fine-tuning Qwen2-Audio, currently supporting Distributed Data Parallel (DDP) and DeepSpeed.

☆50

Alternatives and similar repositories for Qwen2-Audio-finetune

Users that are interested in Qwen2-Audio-finetune are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

jonflynng / qwen2-audio-finetune
View on GitHub
Colab notebook for fine-tuning Qwen2-Audio with trl's SFT and PPO trainers.
☆24Nov 23, 2024Updated last year
mubingshen / MLC-SLM-Baseline
View on GitHub
The project is associated with the recently-launched INTERSPEECH 2025 Workshop on Multilingual Conversational Speech Language Model (MLC-…
☆51May 14, 2025Updated last year
xiaomi-research / r1-aqa
View on GitHub
🤗 R1-AQA Model: mispeech/r1-aqa
☆325Mar 28, 2025Updated last year
teamtee / LLM-ASR-Error-Correction
View on GitHub
This is a framework for using large language models to improve ASR recognition accuracy. You need to provide the recognized text and tag …
☆18Jun 5, 2025Updated last year
pengzhendong / audio-pipeline
View on GitHub
☆23Oct 17, 2024Updated last year
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
MysticShadow427 / simplistic-zipformer
View on GitHub
Simplistic Implementation of Zipformer:A faster and better encoder for automatic speech recognition in PyTorch
☆22Jun 3, 2024Updated 2 years ago
ddlBoJack / MMAR
View on GitHub
[NeurIPS 2025] Benchmark data and code for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
☆214Feb 25, 2026Updated 5 months ago
zruiii / QwenAudioSFT
View on GitHub
The repoduction codes for Qwen-Audio Fine-tuning
☆55Feb 28, 2026Updated 5 months ago
wonjune-kang / expressive-speech-retrieval
View on GitHub
Expressive Speech Retrieval using Natural Language Descriptions of Speaking Style
☆15Aug 18, 2025Updated 11 months ago
pengzhendong / asr-decoder
View on GitHub
CTC decoder with hotwords for ASR.
☆38Jun 15, 2026Updated last month
fireredchat-submodules / livekit-plugins-fireredchat-pvad
View on GitHub
FireRedChat pVAD plugin for LiveKit Agents
☆22Sep 16, 2025Updated 10 months ago
lucadellalib / ts-asr
View on GitHub
Target speaker automatic speech recognition (TS-ASR)
☆14Oct 14, 2023Updated 2 years ago
ZZDoog / ProDubber
View on GitHub
[CVPR 2025] Official implementation of paper "Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie…
☆23Jun 6, 2025Updated last year
PigeonDan1 / ps-slm
View on GitHub
TASU: A New Style of Alignment of Speech LLM with only Text Training Data, zero-shot on ASR and Other SU tasks
☆27Jul 20, 2026Updated last week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
hhhaaahhhaa / ASR-TTA
View on GitHub
☆16Nov 4, 2025Updated 8 months ago
Anvarjon / Age-Gender-Classification
View on GitHub
Official implementation of the paper titled "Age and Gender Recognition Using a Convolutional Neural Network with a Specially Designed Mu…
☆28Mar 5, 2024Updated 2 years ago
ShiningLab / POS-Tagger-for-Punctuation-Restoration
View on GitHub
This repository is for the paper Incorporating External POS Tagger for Punctuation Restoration. Proc. Interspeech 2021, 1987-1991, doi: 1…
☆11May 24, 2026Updated 2 months ago
EIT-NLP / LLaSO
View on GitHub
☆116Oct 21, 2025Updated 9 months ago
HarryHsing / EchoInk
View on GitHub
EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning (🔥The Exploration of R1 for General Audio-Vis…
☆78Jun 3, 2026Updated last month
pengzhendong / streaming-asr
View on GitHub
One command to start a streaming ASR server.
☆12Oct 2, 2024Updated last year
AmphionTeam / SpeechJudge
View on GitHub
SpeechJudge: Towards Human-Level Judgment for Speech Naturalness (https://arxiv.org/abs/2511.07931)
☆79Dec 23, 2025Updated 7 months ago
jh-cha-prml / JELLY
View on GitHub
Code for the paper "JELLY: Joint Emotion Recognition and Context Reasoning with LLMs for Conversational Speech Synthesis"
☆14Nov 5, 2024Updated last year
kehanlu / DeSTA2.5-Audio
View on GitHub
Code for DeSTA2.5-Audio, general-purpose LALM
☆141Feb 4, 2026Updated 5 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
ddlBoJack / Omni-Captioner
View on GitHub
[ICLR 2026] Data Pipeline, Models, and Benchmark for Omni-Captioner.
☆142Apr 7, 2026Updated 3 months ago
hongfeixue / StutteringSpeechChallenge
View on GitHub
SLT 2024 Mandarin Stuttering Event Detection and Automatic Speech Recognition Challenge
☆12Jun 11, 2024Updated 2 years ago
MrSupW / ICMC-ASR_Baseline
View on GitHub
The baseline system for the ICASSP2024 ICMC-ASR Challenge.
☆57Dec 6, 2023Updated 2 years ago
a43992899 / openl2s
View on GitHub
Open, royalty free, lyrics2song / song generation data collection / cleaning pipeline.
☆17May 9, 2025Updated last year
wonjune-kang / llm-speech-summarization
View on GitHub
Prompting Large Language Models with Audio for General-Purpose Speech Summarization
☆20May 14, 2025Updated last year
ASLP-lab / FMSU-Bench
View on GitHub
Towards Fine-Grained Multi-Dimensional Speech Understanding: Data Pipeline, Benchmark, and Model
☆25May 21, 2026Updated 2 months ago
X-LANCE / SLAM-LLM
View on GitHub
A Framework for Speech, Language, Audio, Music Processing with Large Language Model
☆1,050Jan 15, 2026Updated 6 months ago
Audio-Reasoning-Challenge / Audio-Reasoning-Challenge-Baselines
View on GitHub
The baselines of ARC-Challenge-Interspeech2026
☆60Dec 1, 2025Updated 7 months ago
BUTSpeechFIT / TS_SUPERB
View on GitHub
☆16Apr 2, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
k2-fsa / sherpa-mlx
View on GitHub
sherpa with mlx
☆15Aug 2, 2025Updated 11 months ago
R1ckShi / FrontEnd-AEC
View on GitHub
Acoustic echo cancelation(AEC) is a main algorithm in the pipe line of acoustic devices with KWS or ASR. FNLMS is used.
☆19Apr 22, 2019Updated 7 years ago
tzyll / ChineseHP
View on GitHub
Dataset for Pinyin Regularization in Error Correction for Chinese Speech Recognition with Large Language Models in Interspeech 2024.
☆16Jul 4, 2024Updated 2 years ago
KTTRCDL / UMETTS
View on GitHub
UMETTS: A Unified Framework for Emotional Text-to-Speech Synthesis with Multimodal Prompts
☆41Jun 12, 2025Updated last year
zqlsnr / DPCRN
View on GitHub
real-time speech enhance
☆18Jan 23, 2024Updated 2 years ago
walker-hyf / FCTalker
View on GitHub
FCTalker: Fine and Coarse Grained Context Modeling for Expressive Conversational Speech Synthesis (Accepted by ISCSLP'2024)
☆26Feb 22, 2024Updated 2 years ago
MrSupW / ContextASR-Bench
View on GitHub
A Massive Contextual Speech Recognition Benchmark.
☆107Aug 6, 2025Updated 11 months ago