The repoduction codes for Qwen-Audio Fine-tuning
☆53Feb 28, 2026Updated 2 months ago
Alternatives and similar repositories for QwenAudioSFT
Users that are interested in QwenAudioSFT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Colab notebook for fine-tuning Qwen2-Audio with trl's SFT and PPO trainers.☆24Nov 23, 2024Updated last year
- Source code for ACL 2023 paper "End-to-End Simultaneous Speech Translation with Differentiable Segmentation"☆37Dec 6, 2023Updated 2 years ago
- MTalk-Bench: Evaluating Speech-to-Speech Models in Multi-Turn Dialogues via Arena-style and Rubrics Protocols☆18Nov 19, 2025Updated 5 months ago
- Acoustic echo cancelation(AEC) is a main algorithm in the pipe line of acoustic devices with KWS or ASR. FNLMS is used.☆19Apr 22, 2019Updated 7 years ago
- This repository contains prompts & best practices to annotate audio clips with a very high degree of details using Audio-Language-Models☆35Oct 13, 2024Updated last year
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ☆23Oct 17, 2024Updated last year
- ☆36Jan 6, 2026Updated 3 months ago
- The world's fastest Python package for calculating integrated loudness (LUFS) from audio data as NumPy arrays☆26Dec 26, 2025Updated 4 months ago
- Python runtime for WeTextProcessing (does not depend on Pynini)☆49Nov 28, 2025Updated 5 months ago
- Vocoder NSF-HiFiGAN (Moved into deepaudio)☆56Dec 11, 2022Updated 3 years ago
- CTC decoder with hotwords for ASR.☆35Apr 13, 2025Updated last year
- This is a repository for fine-tuning Qwen2-Audio, currently supporting Distributed Data Parallel (DDP) and DeepSpeed.☆52Jul 28, 2025Updated 9 months ago
- Collection of scripts from mHuBERT-147.☆34Nov 19, 2024Updated last year
- Code for the paper: GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities☆154Dec 5, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Your faithful, impartial partner for audio evaluation — know yourself, know your rivals. 真实评测,知己知彼。☆290Apr 20, 2026Updated last week
- 《SpeechPrompt v2: Prompt Tuning for Speech Classification Tasks》Speech processing with prompting paradigm☆81Oct 19, 2023Updated 2 years ago
- Unified Audio-Visual Perception for Multi-Task Video Localization☆31Apr 19, 2024Updated 2 years ago
- Reimplementation of Miipher☆30Aug 16, 2023Updated 2 years ago
- Offline Speaker Diarization with SenseVoice by Sherpa ONNX.☆15Dec 23, 2024Updated last year
- Github repository for ACL 2025 paper: VoxEval: Benchmarking the Knowledge Understanding Capabilities of End-to-End Spoken Language Models☆24Jun 16, 2025Updated 10 months ago
- ☆68Dec 30, 2025Updated 4 months ago
- ☆15Nov 11, 2024Updated last year
- Repository for "Training Audio Captioning Models without Audio"☆10Sep 26, 2023Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples a…☆655Jun 9, 2024Updated last year
- 《SpeechGen: Unlocking the Generative Power of Speech Language Models with Prompts》☆77Jun 9, 2023Updated 2 years ago
- ☆134Jul 21, 2021Updated 4 years ago
- SpeechFake: A Large-Scale Multilingual Speech Deepfake Dataset Incorporating Cutting-Edge Generation Methods☆27Aug 13, 2025Updated 8 months ago
- ☆21Apr 24, 2025Updated last year
- [INTERSPEECH 2024] EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark☆316Mar 18, 2026Updated last month
- AudioCodec-Hub is a Python library for encoding and decoding audio data, supporting various neural audio codec models☆25Sep 26, 2023Updated 2 years ago
- The open source code for LLM-Codec☆147Aug 18, 2024Updated last year
- Torch Audio Forced Aligner for Mixed Chinese (Mandarin or Cantonese) and English.☆61Sep 5, 2025Updated 7 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Audio-FLAN☆160Sep 23, 2025Updated 7 months ago
- ☆62May 31, 2024Updated last year
- Towards Comprehensive Evaluation for End-to-End Spoken Dialogue Models☆53Sep 2, 2025Updated 7 months ago
- Project for HIDING SPEAKER’S SEX IN SPEECH USING ZERO-EVIDENCE SPEAKER REPRESENTATION IN AN ANALYSIS/SYNTHESIS PIPELINE☆15Nov 30, 2022Updated 3 years ago
- ☆26Sep 22, 2022Updated 3 years ago
- Code release for "MORE: Multi-mOdal REtrieval Augmented Generative Commonsense Reasoning"☆11Oct 11, 2024Updated last year
- WavBench: Benchmarking Reasoning, Colloquialism, and Paralinguistics for End-to-End Spoken Dialogue Models☆31Feb 13, 2026Updated 2 months ago