Colab notebook for fine-tuning Qwen2-Audio with trl's SFT and PPO trainers.
☆24Nov 23, 2024Updated last year
Alternatives and similar repositories for qwen2-audio-finetune
Users that are interested in qwen2-audio-finetune are comparing it to the libraries listed below
Sorting:
- A streaming audio reader, processor, and writer built on top of soundfile, and PyAV (bindings for FFmpeg)☆38Updated this week
- open-source Mandarian biased word dataset☆14Sep 21, 2023Updated 2 years ago
- ☆10Sep 25, 2024Updated last year
- ☆15Apr 4, 2025Updated 11 months ago
- [EMNLP 2025 Findings] A complete cross-modal RAG system for end-to-end speech-to-speech large models, including ASR-based Retrieval and E…☆28Jul 11, 2025Updated 7 months ago
- ☆13Mar 30, 2023Updated 2 years ago
- Implementation and experiment of the MusGConv paper.☆15Sep 6, 2024Updated last year
- Lightweight utilities for music source separation and transcription.☆34Feb 24, 2026Updated last week
- ☆15Jul 4, 2024Updated last year
- This is a repository for fine-tuning Qwen2-Audio, currently supporting Distributed Data Parallel (DDP) and DeepSpeed.☆50Jul 28, 2025Updated 7 months ago
- Towards a general language-audio model for computational paralinguistic tasks☆24Dec 14, 2024Updated last year
- A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline☆196Dec 13, 2024Updated last year
- The repoduction codes for Qwen-Audio Fine-tuning☆53Feb 28, 2026Updated last week
- Keyword spotting for audio with attention (KWS model for audio)☆18Jul 15, 2021Updated 4 years ago
- music semantic understanding evaluation benchmark☆25Aug 12, 2023Updated 2 years ago
- ☆23Oct 17, 2024Updated last year
- Official PyTorch implementation of "Paralinguistics-Aware Speech-Empowered LLMs for Natural Conversation" (NeurIPS 2024)☆94Dec 3, 2024Updated last year
- Open-Ended Speaking Style Modeling via Fine-Grained and Multi-Granular Contrastive Language-Speech Pre-training☆68Feb 7, 2026Updated last month
- ☆68Dec 30, 2025Updated 2 months ago
- Towards Fine-grained Audio Captioning with Multimodal Contextual Cues☆87Jan 4, 2026Updated 2 months ago
- Production first, nn-based on-device signal processing toolkit.☆65May 30, 2023Updated 2 years ago
- ☆24Sep 10, 2025Updated 6 months ago
- Uyghur Single Speaker Speech Dataset. ウイグル語音声データセット☆34Apr 3, 2022Updated 3 years ago
- Code for the paper: GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities☆153Dec 5, 2024Updated last year
- Based on Neural Amp Modeler 0.7.1 with some enhanced features☆12Apr 18, 2023Updated 2 years ago
- Speech Emotion Recognition using Deep Learning☆12May 24, 2021Updated 4 years ago
- Detecting and correction dysfluencies/stuttering/stammering in audio files☆10Apr 23, 2023Updated 2 years ago
- Non-parallel voice conversion called ICRCycleGAN-VC based on CycleGAN and Inception-resNet module by Afiuny☆15Oct 30, 2025Updated 4 months ago
- A Swift library that makes it easier to create AVAudioEngine-based audio players☆11Oct 14, 2023Updated 2 years ago
- VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling☆97Nov 9, 2024Updated last year
- PyTorch implementation of DiffRoll, a diffusion-based generative automatic music transcription (AMT) model☆80Dec 6, 2023Updated 2 years ago
- Ultra-low-bitrate Speech Codec for Speech Language Modeling Applications☆88Dec 20, 2024Updated last year
- Efficient audio understanding with general audio captions☆403Nov 3, 2025Updated 4 months ago
- Open repository of simulated Room Impulse Responses (RIR) accompanying the paper "Hearing Anywhere in Any Environment"☆71Aug 11, 2025Updated 6 months ago
- ☆37Jul 4, 2024Updated last year
- Codes for ICASSP 2024 paper: BEAST: Online Joint Beat and Downbeat Tracking Based on Streaming Transformer. An online beat tracking syste…☆42Sep 11, 2024Updated last year
- WavReward: Spoken Dialogue Models With Generalist Reward Evaluators☆54May 15, 2025Updated 9 months ago
- Speaker embedding for VI-SVC and VI-SVS, alse for VITS; Use this to replace the ID to implement voice clone.☆30Sep 16, 2022Updated 3 years ago
- WavBench: Benchmarking Reasoning, Colloquialism, and Paralinguistics for End-to-End Spoken Dialogue Models☆27Feb 13, 2026Updated 3 weeks ago