jonflynng / qwen2-audio-finetuneView external linksLinks
Colab notebook for fine-tuning Qwen2-Audio with trl's SFT and PPO trainers.
☆24Nov 23, 2024Updated last year
Alternatives and similar repositories for qwen2-audio-finetune
Users that are interested in qwen2-audio-finetune are comparing it to the libraries listed below
Sorting:
- A streaming audio reader, processor, and writer built on top of soundfile, and PyAV (bindings for FFmpeg)☆37Jan 15, 2026Updated last month
- ☆11Sep 25, 2024Updated last year
- Lightweight utilities for music source separation.☆28Aug 21, 2025Updated 5 months ago
- ☆14Apr 4, 2025Updated 10 months ago
- [EMNLP 2025 Findings] A complete cross-modal RAG system for end-to-end speech-to-speech large models, including ASR-based Retrieval and E…☆27Jul 11, 2025Updated 7 months ago
- Implementation and experiment of the MusGConv paper.☆15Sep 6, 2024Updated last year
- ☆15Jul 4, 2024Updated last year
- This is a repository for fine-tuning Qwen2-Audio, currently supporting Distributed Data Parallel (DDP) and DeepSpeed.☆49Jul 28, 2025Updated 6 months ago
- Towards a general language-audio model for computational paralinguistic tasks☆23Dec 14, 2024Updated last year
- A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline☆195Dec 13, 2024Updated last year
- The repoduction codes for Qwen-Audio Fine-tuning☆53Aug 15, 2024Updated last year
- ☆23Oct 17, 2024Updated last year
- music semantic understanding evaluation benchmark☆25Aug 12, 2023Updated 2 years ago
- Open-Ended Speaking Style Modeling via Fine-Grained and Multi-Granular Contrastive Language-Speech Pre-training☆64Feb 7, 2026Updated last week
- Keyword spotting for audio with attention (KWS model for audio)☆18Jul 15, 2021Updated 4 years ago
- Official PyTorch implementation of "Paralinguistics-Aware Speech-Empowered LLMs for Natural Conversation" (NeurIPS 2024)☆94Dec 3, 2024Updated last year
- ☆68Dec 30, 2025Updated last month
- Towards Fine-grained Audio Captioning with Multimodal Contextual Cues☆86Jan 4, 2026Updated last month
- Production first, nn-based on-device signal processing toolkit.☆65May 30, 2023Updated 2 years ago
- ☆24Sep 10, 2025Updated 5 months ago
- Uyghur Single Speaker Speech Dataset. ウイグル語音声データセット☆33Apr 3, 2022Updated 3 years ago
- Code for the paper: GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities☆152Dec 5, 2024Updated last year
- A Swift library that makes it easier to create AVAudioEngine-based audio players☆11Oct 14, 2023Updated 2 years ago
- Detecting and correction dysfluencies/stuttering/stammering in audio files☆10Apr 23, 2023Updated 2 years ago
- Speech Emotion Recognition using Deep Learning☆12May 24, 2021Updated 4 years ago
- Non-parallel voice conversion called ICRCycleGAN-VC based on CycleGAN and Inception-resNet module by Afiuny☆15Oct 30, 2025Updated 3 months ago
- Based on Neural Amp Modeler 0.7.1 with some enhanced features☆12Apr 18, 2023Updated 2 years ago
- Efficient audio understanding with general audio captions☆398Nov 3, 2025Updated 3 months ago
- VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling☆96Nov 9, 2024Updated last year
- Ultra-low-bitrate Speech Codec for Speech Language Modeling Applications☆87Dec 20, 2024Updated last year
- PyTorch implementation of DiffRoll, a diffusion-based generative automatic music transcription (AMT) model☆80Dec 6, 2023Updated 2 years ago
- Open repository of simulated Room Impulse Responses (RIR) accompanying the paper "Hearing Anywhere in Any Environment"☆69Aug 11, 2025Updated 6 months ago
- ☆37Jul 4, 2024Updated last year
- WavReward: Spoken Dialogue Models With Generalist Reward Evaluators☆54May 15, 2025Updated 9 months ago
- Codes for ICASSP 2024 paper: BEAST: Online Joint Beat and Downbeat Tracking Based on Streaming Transformer. An online beat tracking syste…☆41Sep 11, 2024Updated last year
- Examples of how to achieve OpenGL streaming with Qt, Websockets etc☆13May 25, 2016Updated 9 years ago
- Code for Dynamic Cast workshop at Audio Developer Conference 2024: Practical Machine Learning☆17Nov 11, 2024Updated last year
- ☆10Oct 20, 2022Updated 3 years ago
- Keyword extraction using Scake, KeyBERT, Fine-tuning Transformer BERT-like models and ChatGPT.☆12May 22, 2023Updated 2 years ago