jonflynng/qwen2-audio-finetune

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/jonflynng/qwen2-audio-finetune)

jonflynng / qwen2-audio-finetune

Colab notebook for fine-tuning Qwen2-Audio with trl's SFT and PPO trainers.

☆24

Alternatives and similar repositories for qwen2-audio-finetune

Users that are interested in qwen2-audio-finetune are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

teamtee / Qwen2-Audio-finetune
View on GitHub
This is a repository for fine-tuning Qwen2-Audio, currently supporting Distributed Data Parallel (DDP) and DeepSpeed.
☆50Jul 28, 2025Updated last year
Kirili4ik / kws-attention-pytorch
View on GitHub
Keyword spotting for audio with attention (KWS model for audio)
☆18Jul 15, 2021Updated 5 years ago
anthony-wss / glm-4-voice-finetune
View on GitHub
☆14Apr 4, 2025Updated last year
JishengBai / AudioSetCaps
View on GitHub
A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline
☆208Dec 13, 2024Updated last year
frankenliu / LOAE
View on GitHub
☆10Sep 25, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
lysanderism / TimeAudio
View on GitHub
The official repository TimeAudio, a comprehensive framework that incorporates fine-grained acoustic cues into LALMs with enhanced module…
☆30Nov 18, 2025Updated 8 months ago
manoskary / MusGConv
View on GitHub
Implementation and experiment of the MusGConv paper.
☆15Sep 6, 2024Updated last year
thuhcsi / Contextual-Biasing-Dataset
View on GitHub
open-source Mandarian biased word dataset
☆14Sep 21, 2023Updated 2 years ago
xiaomi-research / r1-aqa
View on GitHub
🤗 R1-AQA Model: mispeech/r1-aqa
☆325Mar 28, 2025Updated last year
zruiii / QwenAudioSFT
View on GitHub
The repoduction codes for Qwen-Audio Fine-tuning
☆55Feb 28, 2026Updated 5 months ago
pengzhendong / audio-pipeline
View on GitHub
☆23Oct 17, 2024Updated last year
tzyll / ChineseHP
View on GitHub
Dataset for Pinyin Regularization in Error Correction for Chinese Speech Recognition with Large Language Models in Interspeech 2024.
☆16Jul 4, 2024Updated 2 years ago
Mddct / WeUSM
View on GitHub
☆13Mar 30, 2023Updated 3 years ago
WalkerMitty / Fast-Llama2
View on GitHub
Fast instruction tuning with Llama2
☆10Apr 8, 2024Updated 2 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
pengzhendong / audiolab
View on GitHub
A streaming audio reader, processor, and writer built on top of soundfile, and PyAV (bindings for FFmpeg)
☆39Mar 31, 2026Updated 3 months ago
microsoft / POLAR
View on GitHub
Experiments for "Automatic Calibration and Error Correction for Large Language Models via Pareto Optimal Self-Supervision"
☆14Aug 4, 2023Updated 2 years ago
naver-ai / usdm
View on GitHub
Official PyTorch implementation of "Paralinguistics-Aware Speech-Empowered LLMs for Natural Conversation" (NeurIPS 2024)
☆95Dec 3, 2024Updated last year
dominickrei / MatchboxNet
View on GitHub
An implementation of MatchboxNet
☆13May 4, 2022Updated 4 years ago
i3thuan5 / hts_engine_python
View on GitHub
python wrap for hts engine
☆14Jan 30, 2018Updated 8 years ago
juanmc2005 / SimilarityLearning
View on GitHub
Similarity Learning applied to Speaker Verification and Semantic Textual Similarity
☆13Apr 8, 2020Updated 6 years ago
asappresearch / simple-tts
View on GitHub
Contains the code associated with the ICLR submission for our text-to-speech diffusion model
☆57Oct 31, 2023Updated 2 years ago
xinchen-ai / Westlake-Omni
View on GitHub
☆203Sep 24, 2024Updated last year
deepvk / muse
View on GitHub
🎵 muse: Music Separation
☆11Feb 14, 2024Updated 2 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
hanshounsu / d3rm
View on GitHub
☆14Feb 3, 2026Updated 5 months ago
seungheondoh / msu-benchmark
View on GitHub
music semantic understanding evaluation benchmark
☆24Aug 12, 2023Updated 2 years ago
aidanmomo / Speech-Enhancement-Metrics-SNR-SDRi-SISDRi
View on GitHub
☆10Apr 7, 2022Updated 4 years ago
cwang621 / blsp-emo
View on GitHub
BLSP-Emo: Towards Empathetic Large Speech-Language Models
☆62Jun 7, 2024Updated 2 years ago
TeaPoly / CE-OptimizedLoss
View on GitHub
Optimized loss based on cross-entropy (CE), like MWER (minimum WER) Loss with beam search and negative sampling strategy, Smoothed Max Po…
☆25Oct 11, 2024Updated last year
ASLP-lab / FMSU-Bench
View on GitHub
Towards Fine-Grained Multi-Dimensional Speech Understanding: Data Pipeline, Benchmark, and Model
☆25May 21, 2026Updated 2 months ago
fakufaku / create_wsj1_2345_db
View on GitHub
Collection of scripts to create a dataset of noisy multi-channel reverberant mixtures based on wsj1 and CHiME3 datasets.
☆15Dec 6, 2021Updated 4 years ago
thuhcsi / SpeechCraft
View on GitHub
The official repository of SpeechCraft dataset, a large-scale expressive bilingual speech dataset with natural language descriptions.
☆198Feb 28, 2026Updated 5 months ago
fakufaku / auxiva-ipa
View on GitHub
Fast algorithm for determined blind source separation with update of demixing filters with joint adjustment of the remaining sources.
☆36Mar 22, 2021Updated 5 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
yuhangsu82 / AMG-Embedding
View on GitHub
A self-supervised method for feature extraction from audio.
☆21Apr 9, 2026Updated 3 months ago
crazykun / feishu-bot-markdown
View on GitHub
飞书markdown消息模板 go-feishu-bot-markdown
☆15Apr 7, 2026Updated 3 months ago
ddlBoJack / MMAR
View on GitHub
[NeurIPS 2025] Benchmark data and code for MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
☆214Feb 25, 2026Updated 5 months ago
Sreyan88 / GAMA
View on GitHub
Code for the paper: GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities
☆153Dec 5, 2024Updated last year
lavendery / AudioComposer
View on GitHub
☆27Sep 10, 2025Updated 10 months ago
LexiestLeszek / web-search-ollama-qwen-local
View on GitHub
Local LLM Web search using qwen model and Ollama
☆15Feb 9, 2024Updated 2 years ago
jymh / SAP2-ASR
View on GitHub
☆26Jan 23, 2026Updated 6 months ago