zruiii/QwenAudioSFT

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/zruiii/QwenAudioSFT)

zruiii / QwenAudioSFT

The repoduction codes for Qwen-Audio Fine-tuning

☆55

Alternatives and similar repositories for QwenAudioSFT

Users that are interested in QwenAudioSFT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

changelinglab / prism
View on GitHub
A toolkit and benchmark for evaluating phonetic capabilities of speech models.
☆18Apr 10, 2026Updated 3 months ago
jonflynng / qwen2-audio-finetune
View on GitHub
Colab notebook for fine-tuning Qwen2-Audio with trl's SFT and PPO trainers.
☆24Nov 23, 2024Updated last year
SparkAudio / SparkVox
View on GitHub
☆37Jun 9, 2025Updated last year
vTAD2025-Challenge / vTAD
View on GitHub
☆16Oct 24, 2025Updated 8 months ago
pengzhendong / torchfa
View on GitHub
Torch Audio Forced Aligner for Mixed Chinese (Mandarin or Cantonese) and English.
☆61Sep 5, 2025Updated 10 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
NARUTO-2024 / WavBench
View on GitHub
WavBench: Benchmarking Reasoning, Colloquialism, and Paralinguistics for End-to-End Spoken Dialogue Models
☆34Feb 13, 2026Updated 5 months ago
ictnlp / DiSeg
View on GitHub
Source code for ACL 2023 paper "End-to-End Simultaneous Speech Translation with Differentiable Segmentation"
☆37Dec 6, 2023Updated 2 years ago
R1ckShi / FrontEnd-AEC
View on GitHub
Acoustic echo cancelation(AEC) is a main algorithm in the pipe line of acoustic devices with KWS or ASR. FNLMS is used.
☆19Apr 22, 2019Updated 7 years ago
egruttadauria98 / SSpaVAlDo
View on GitHub
☆37Jan 6, 2026Updated 6 months ago
LAION-AI / emotional-speech-annotations
View on GitHub
This repository contains prompts & best practices to annotate audio clips with a very high degree of details using Audio-Language-Models
☆35Oct 13, 2024Updated last year
pengzhendong / audio-pipeline
View on GitHub
☆23Oct 17, 2024Updated last year
zhu-han / SpeechLLM
View on GitHub
LLM-based ASR recipe with Zipformer encoder and Qwen LLM
☆34Sep 25, 2025Updated 9 months ago
vtuber-plan / NSF-HiFiGAN
View on GitHub
Vocoder NSF-HiFiGAN (Moved into deepaudio)
☆56Dec 11, 2022Updated 3 years ago
iver56 / loudness
View on GitHub
The world's fastest Python package for calculating integrated loudness (LUFS) from audio data as NumPy arrays
☆31Dec 26, 2025Updated 6 months ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
teamtee / Qwen2-Audio-finetune
View on GitHub
This is a repository for fine-tuning Qwen2-Audio, currently supporting Distributed Data Parallel (DDP) and DeepSpeed.
☆50Jul 28, 2025Updated 11 months ago
Sreyan88 / GAMA
View on GitHub
Code for the paper: GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities
☆153Dec 5, 2024Updated last year
rishikksh20 / voxtral-codec-pytoch
View on GitHub
Voxtral Codec : Combining Semantic VQ and Acoustic FSQ for Ultra-Low Bitrate Speech Generation (Voxtral TTS Backbone)
☆15Mar 27, 2026Updated 3 months ago
FreedomIntelligence / ExpressiveSpeech
View on GitHub
☆17Jun 10, 2026Updated last month
ga642381 / SpeechPrompt-v2
View on GitHub
《SpeechPrompt v2: Prompt Tuning for Speech Classification Tasks》Speech processing with prompting paradigm
☆81Oct 19, 2023Updated 2 years ago
wenet-e2e / west
View on GitHub
We Speech Toolkit, LLM based Speech Toolkit for Speech Understanding, Generation, and Interaction
☆206Updated this week
ajaybati / miipher2.0
View on GitHub
Reimplementation of Miipher
☆30Aug 16, 2023Updated 2 years ago
EIT-NLP / LLaSO
View on GitHub
☆116Oct 21, 2025Updated 9 months ago
ttgeng233 / UniAV
View on GitHub
Unified Audio-Visual Perception for Multi-Task Video Localization
☆33Apr 19, 2024Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
microsoft / NoAudioCaptioning
View on GitHub
Repository for "Training Audio Captioning Models without Audio"
☆10Sep 26, 2023Updated 2 years ago
YMLLG / SPEECHFAKE
View on GitHub
SpeechFake: A Large-Scale Multilingual Speech Deepfake Dataset Incorporating Cutting-Edge Generation Methods
☆28Aug 13, 2025Updated 11 months ago
ga642381 / SpeechGen
View on GitHub
《SpeechGen: Unlocking the Generative Power of Speech Language Models with Prompts》
☆77Jun 9, 2023Updated 3 years ago
felixfuyihui / AISHELL-4
View on GitHub
☆140Jul 21, 2021Updated 5 years ago
OpenBMB / UltraEval-Audio
View on GitHub
Your faithful, impartial partner for audio evaluation — know yourself, know your rivals. 真实评测，知己知彼。A unified benchmark framework for ASR/…
☆309Updated this week
ZhangXInFD / SpeechTokenizer
View on GitHub
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples a…
☆658Jun 9, 2024Updated 2 years ago
emo-box / EmoBox
View on GitHub
[INTERSPEECH 2024] EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark
☆321Mar 18, 2026Updated 4 months ago
dhimasryan / MOSA-Net-Cross-Domain
View on GitHub
☆63May 31, 2024Updated 2 years ago
VickiCui / MORE
View on GitHub
Code release for "MORE: Multi-mOdal REtrieval Augmented Generative Commonsense Reasoning"
☆11Oct 11, 2024Updated last year
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
yangdongchao / LLM-Codec
View on GitHub
The open source code for LLM-Codec
☆147Aug 18, 2024Updated last year
ga642381 / AudioCodec-Hub
View on GitHub
AudioCodec-Hub is a Python library for encoding and decoding audio data, supporting various neural audio codec models
☆25Sep 26, 2023Updated 2 years ago
nii-yamagishilab / speaker_sex_attribute_privacy
View on GitHub
Project for HIDING SPEAKER’S SEX IN SPEECH USING ZERO-EVIDENCE SPEAKER REPRESENTATION IN AN ANALYSIS/SYNTHESIS PIPELINE
☆15Nov 30, 2022Updated 3 years ago
lmxue / Audio-FLAN
View on GitHub
Audio-FLAN
☆161Sep 23, 2025Updated 9 months ago
CODEJIN / VITS_Diffusion
View on GitHub
☆26Sep 22, 2022Updated 3 years ago
shenduldh / CosyVoice-Lightning
View on GitHub
Lightning-responsive CosyVoice streaming API based on FastAPI.
☆28Apr 27, 2026Updated 2 months ago
dreamtheater123 / VoxEval
View on GitHub
Github repository for ACL 2025 paper: VoxEval: Benchmarking the Knowledge Understanding Capabilities of End-to-End Spoken Language Models
☆24Jun 16, 2025Updated last year