Bert-vits2转写和标注独立整合Webui,整合阿里FunAsr,必剪Asr以及Whisper大模型
☆183Jul 10, 2024Updated last year
Alternatives and similar repositories for ASR_TOOLS_SenseVoice_WebUI
Users that are interested in ASR_TOOLS_SenseVoice_WebUI are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Compute WER and SER for speech recognition evaluation☆27Mar 18, 2026Updated 3 weeks ago
- 音频响度统一,音量归一化处理☆13May 3, 2024Updated last year
- Models and codes for INTERSPEECH 2023 paper DistilXLSR: A Light Weight Cross-Lingual Speech Representation Model☆13Mar 30, 2025Updated last year
- Bert-VITS2-Extra_中文特化版本 训练和推理☆26Feb 10, 2024Updated 2 years ago
- CTC decoder with hotwords for ASR.☆35Apr 13, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- CosyVoice在Windows环境下使用的版本☆762Nov 19, 2024Updated last year
- Unofficial pytorch reproduction for the paper "Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction" (…☆60Apr 4, 2024Updated 2 years ago
- FunASR安卓端侧离线版本2pass全模式☆15Sep 4, 2023Updated 2 years ago
- Bert-vits2-V2.3 训练和推理☆50Mar 13, 2024Updated 2 years ago
- Speech-To-Text forced-alignment Speech processing Universal PERformance Benchmark☆36May 7, 2025Updated 11 months ago
- 张艺谋(国师)一键声音克隆和恶搞文本生成项目☆17Jun 15, 2023Updated 2 years ago
- A enterprise-grade Chinese-English code switch punctuator from funasr.☆33Apr 26, 2024Updated last year
- 基于Faster-whisper和modelscope一键生成双语字幕,双语字幕生成器,基于离线大模型,Generate bilingual subtitles with one click based on Faster-whisper and modelscope. O…☆417Dec 1, 2024Updated last year
- A streaming audio reader, processor, and writer built on top of soundfile, and PyAV (bindings for FFmpeg)☆38Mar 31, 2026Updated 2 weeks ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆23Oct 17, 2024Updated last year
- Finding the most similar tone/color in a large collection of audio. 在一大堆音频中寻找最相似的音色。☆13Jun 17, 2024Updated last year
- A enterprise-grade Voice Activity Detector from modelscope and funasr.☆133Apr 26, 2023Updated 2 years ago
- Simple voice activity detection (VAD) algorithm in Python☆15Aug 10, 2023Updated 2 years ago
- (WIP)long form speech generatoins☆31Apr 2, 2025Updated last year
- paraformer(chinense asr) online onnx runtime for python☆54Mar 27, 2024Updated 2 years ago
- VITS2 using Phoneme-Level Japanese BERT☆14Dec 17, 2023Updated 2 years ago
- IMAGdressing在Windows环境下运行的webui界面☆22Jul 25, 2024Updated last year
- 基于Bert-vits2-Extra项目添加的流式推理和流式接口api功能☆16Apr 12, 2024Updated 2 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- 🇺🇦 Open Source Ukrainian Text-to-Speech datasets☆23Feb 24, 2025Updated last year
- 大量の音声データから笑い声部分を集めるやつ☆12May 23, 2024Updated last year
- SenseVoice-python: A enterprise-grade open source multi-language asr system from funasr opensource with onnxruntime☆112Oct 6, 2025Updated 6 months ago
- ☆36Sep 6, 2025Updated 7 months ago
- Multilingual Voice Understanding Model☆7,957Dec 30, 2025Updated 3 months ago
- This repository contains prompts & best practices to annotate audio clips with a very high degree of details using Audio-Language-Models☆35Oct 13, 2024Updated last year
- ☆157Feb 6, 2025Updated last year
- API and websocket server for sensevoice. It has inherited some enhanced features, such as VAD detection, real-time streaming recognition,…☆540Oct 23, 2024Updated last year
- trying to reproduce suno v3☆34Jan 29, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Torch Audio Forced Aligner for Mixed Chinese (Mandarin or Cantonese) and English.☆61Sep 5, 2025Updated 7 months ago
- G2pw's inference speed is accelerated by about 8-10 times. Change loop generated predictive data to only once and model loop prediction b…☆14Dec 30, 2023Updated 2 years ago
- This is a repository that collects common audio noise reduction models, using Gradio to demonstrate the use of each model, which is very …☆53Jan 26, 2026Updated 2 months ago
- ☆14Jun 16, 2023Updated 2 years ago
- Just another FastSpeech 2 but cleaner code :)☆29Jun 28, 2024Updated last year
- Real-time end-to-end singing voice convertion☆24Nov 3, 2024Updated last year
- TTSAudioNormalizer is a specialized tool for TTS data production, featuring descriptive statistical analysis of audio loudness and loud…☆111Dec 20, 2024Updated last year