The repoduction codes for Qwen-Audio Fine-tuning
☆54Feb 28, 2026Updated 2 months ago
Alternatives and similar repositories for QwenAudioSFT
Users that are interested in QwenAudioSFT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Colab notebook for fine-tuning Qwen2-Audio with trl's SFT and PPO trainers.☆24Nov 23, 2024Updated last year
- Source code for ACL 2023 paper "End-to-End Simultaneous Speech Translation with Differentiable Segmentation"☆37Dec 6, 2023Updated 2 years ago
- MTalk-Bench: Evaluating Speech-to-Speech Models in Multi-Turn Dialogues via Arena-style and Rubrics Protocols☆19Nov 19, 2025Updated 6 months ago
- Acoustic echo cancelation(AEC) is a main algorithm in the pipe line of acoustic devices with KWS or ASR. FNLMS is used.☆19Apr 22, 2019Updated 7 years ago
- This repository contains prompts & best practices to annotate audio clips with a very high degree of details using Audio-Language-Models☆35Oct 13, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆23Oct 17, 2024Updated last year
- ☆36Jan 6, 2026Updated 4 months ago
- The world's fastest Python package for calculating integrated loudness (LUFS) from audio data as NumPy arrays☆26Dec 26, 2025Updated 4 months ago
- Python runtime for WeTextProcessing (does not depend on Pynini)☆51Nov 28, 2025Updated 5 months ago
- Vocoder NSF-HiFiGAN (Moved into deepaudio)☆56Dec 11, 2022Updated 3 years ago
- CTC decoder with hotwords for ASR.☆36Apr 13, 2025Updated last year
- This is a repository for fine-tuning Qwen2-Audio, currently supporting Distributed Data Parallel (DDP) and DeepSpeed.☆51Jul 28, 2025Updated 9 months ago
- Collection of scripts from mHuBERT-147.☆35Nov 19, 2024Updated last year
- Code for the paper: GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities☆154Dec 5, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Your faithful, impartial partner for audio evaluation — know yourself, know your rivals. 真实评测,知己知彼。☆298Apr 20, 2026Updated last month
- 《SpeechPrompt v2: Prompt Tuning for Speech Classification Tasks》Speech processing with prompting paradigm☆81Oct 19, 2023Updated 2 years ago
- Unified Audio-Visual Perception for Multi-Task Video Localization☆31Apr 19, 2024Updated 2 years ago
- Reimplementation of Miipher☆30Aug 16, 2023Updated 2 years ago
- Offline Speaker Diarization with SenseVoice by Sherpa ONNX.☆15Dec 23, 2024Updated last year
- ☆71Dec 30, 2025Updated 4 months ago
- Github repository for ACL 2025 paper: VoxEval: Benchmarking the Knowledge Understanding Capabilities of End-to-End Spoken Language Models☆24Jun 16, 2025Updated 11 months ago
- ☆15Nov 11, 2024Updated last year
- Repository for "Training Audio Captioning Models without Audio"☆10Sep 26, 2023Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- 《SpeechGen: Unlocking the Generative Power of Speech Language Models with Prompts》☆77Jun 9, 2023Updated 2 years ago
- This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples a…☆658Jun 9, 2024Updated last year
- ☆134Jul 21, 2021Updated 4 years ago
- SpeechFake: A Large-Scale Multilingual Speech Deepfake Dataset Incorporating Cutting-Edge Generation Methods☆27Aug 13, 2025Updated 9 months ago
- ☆21Apr 24, 2025Updated last year
- [INTERSPEECH 2024] EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark☆319Mar 18, 2026Updated 2 months ago
- AudioCodec-Hub is a Python library for encoding and decoding audio data, supporting various neural audio codec models☆25Sep 26, 2023Updated 2 years ago
- The open source code for LLM-Codec☆147Aug 18, 2024Updated last year
- Torch Audio Forced Aligner for Mixed Chinese (Mandarin or Cantonese) and English.☆61Sep 5, 2025Updated 8 months ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Audio-FLAN☆160Sep 23, 2025Updated 7 months ago
- ☆62May 31, 2024Updated last year
- Towards Comprehensive Evaluation for End-to-End Spoken Dialogue Models☆55Sep 2, 2025Updated 8 months ago
- Project for HIDING SPEAKER’S SEX IN SPEECH USING ZERO-EVIDENCE SPEAKER REPRESENTATION IN AN ANALYSIS/SYNTHESIS PIPELINE☆15Nov 30, 2022Updated 3 years ago
- ☆26Sep 22, 2022Updated 3 years ago
- Code release for "MORE: Multi-mOdal REtrieval Augmented Generative Commonsense Reasoning"☆11Oct 11, 2024Updated last year
- WavBench: Benchmarking Reasoning, Colloquialism, and Paralinguistics for End-to-End Spoken Dialogue Models☆31Feb 13, 2026Updated 3 months ago