antirez / qwen-asrView external linksLinks
C inference for Qwen3-ASR 0.6b and 1.7b transcriptions models
☆216Updated this week
Alternatives and similar repositories for qwen-asr
Users that are interested in qwen-asr are comparing it to the libraries listed below
Sorting:
- semantic tokenizer for speech and music☆21Jul 6, 2025Updated 7 months ago
- FlowMirror-HydraVox — A natively accelerated multi-head autoregressive TTS system derived from CosyVoice 3.0. It predicts multiple tokens…☆38Jan 22, 2026Updated 3 weeks ago
- Cut2Next: Generating Next Shot via In-Context Tuning☆31Aug 21, 2025Updated 5 months ago
- ☆23Feb 2, 2022Updated 4 years ago
- ☆54Dec 8, 2025Updated 2 months ago
- Here we will track the latest Audio AI Agent, including speech, music, sound effects, etc.☆16Dec 8, 2023Updated 2 years ago
- implementation of https://arxiv.org/pdf/2312.09299☆21Jul 3, 2024Updated last year
- The ✨Magical✨ JAX ML Library.☆18Jan 25, 2025Updated last year
- MMD viewer powered by Babylon.js and babylon-mmd☆16Aug 2, 2025Updated 6 months ago
- Official Repository of Event-Driven Storytelling with Multiple Lifelike Humans in a 3D Scene (ICCV 2025)☆26Nov 11, 2025Updated 3 months ago
- PyTorch implementation of the ICASSP-24 paper: "Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Superv…☆38Jan 6, 2024Updated 2 years ago
- ☆124Feb 6, 2025Updated last year
- Implementation of HS-TasNet, "Real-time Low-latency Music Source Separation using Hybrid Spectrogram-TasNet"☆86Nov 27, 2025Updated 2 months ago
- ☆17Nov 6, 2025Updated 3 months ago
- Code for 'JUST-DUB-IT: Video Dubbing via Joint Audio-Visual Diffusion'☆38Updated this week
- Variable Bitrate Residual Vector Quantization for Audio Coding☆51May 1, 2025Updated 9 months ago
- LLM Benchmark☆39May 24, 2025Updated 8 months ago
- An End-to-End Pipeline for Enhanced French Text-to-Speech with SSML Prosody Control☆30Jan 13, 2026Updated last month
- Baseline for DCASE 2024 Task 9: "Language-Queried Audio Source Separation"☆26Mar 27, 2024Updated last year
- [CVPR 2025] Official code of "PanDA: Towards Panoramic Depth Anything with Unlabeled Panoramas and Mobius Spatial Augmentation"☆39Mar 18, 2025Updated 10 months ago
- My hybrid TTS network that combines, VALL-E, VoiceBox, SpeechFlow, Seamless and TortoiseTTS into one☆26Aug 5, 2024Updated last year
- [ECCV 2024] PanoFree: Tuning-Free Holistic Multi-view Image Generation with Cross-view Self-Guidance☆23Jul 25, 2024Updated last year
- "Fx-Encoder++: Extracting Instrument-wise Audio Effect Representations from Mixtures"☆47Aug 23, 2025Updated 5 months ago
- ☆68Dec 30, 2025Updated last month
- Script for processing OpenAI's PRM800K process supervision dataset into an Alpaca-style instruction-response format☆27Jul 12, 2023Updated 2 years ago
- spatial intelligence; interactive 3D scene generation; world model☆62Dec 16, 2025Updated last month
- ☆25Mar 6, 2024Updated last year
- [TAFFC 2025] The official implementation of EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vec…☆118Sep 7, 2025Updated 5 months ago
- Code for the EGSR 2025 Paper "Content-Aware Texturing for Gaussian Splatting"☆83Jan 8, 2026Updated last month
- ☆31Jun 15, 2024Updated last year
- 3D Gaussian Flats: Hybrid 2D/3D Photometric Scene Reconstruction☆57Nov 26, 2025Updated 2 months ago
- Official repository for code and information related to the HumanOLAT dataset (ICCV 2025).☆38Nov 17, 2025Updated 2 months ago
- E2E TTS using Conditional Flow Matching (Experimental*)☆71Nov 10, 2023Updated 2 years ago
- An open-source Kazakh Emotional Text-to-Speech Dataset☆35Aug 1, 2025Updated 6 months ago
- [ICCV 2025] LightSwitch: Multi-view Relighting with Material-guided Diffusion☆61Aug 13, 2025Updated 6 months ago
- https://x.com/BlinkDL_AI/status/1884768989743882276☆28May 4, 2025Updated 9 months ago
- OpenFLAM: Framewise Language Audio Model☆88Jan 14, 2026Updated last month
- [DEPRECIATED] [PyTorch 2.0] [638M] [85.33% acc] Full-attention multi-instrumental music transformer for supervised music generation, opti…☆32Nov 23, 2023Updated 2 years ago
- Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)☆78Nov 1, 2024Updated last year