aliyun/alibabacloud-bailian-speech-demo

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/aliyun/alibabacloud-bailian-speech-demo)

aliyun / alibabacloud-bailian-speech-demo

Sample Repository for the AlibabaCloud Bailian Speech SDK

☆421

Alternatives and similar repositories for alibabacloud-bailian-speech-demo

Users that are interested in alibabacloud-bailian-speech-demo are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

aliyun / alibabacloud-nls-python-sdk
View on GitHub
“alibabacloud-nls-python-sdk提供使用阿里云智能语音服务的能力，包括语音识别、语音合成、文件转写等。”
☆80Aug 22, 2025Updated 11 months ago
easygoingbl / auditlimit
View on GitHub
内容审核及速率限制服务
☆26May 18, 2025Updated last year
ABexit / ASR-LLM-TTS
View on GitHub
This is a speech interaction system built on an open-source model, integrating ASR, LLM, and TTS in sequence. The ASR model is SenceVoice…
☆1,262Jun 3, 2026Updated last month
modelscope / FunASR
View on GitHub
Open-source speech recognition toolkit for training, inference, streaming ASR, VAD, punctuation, speaker diarization pipelines, and OpenA…
☆19,467Updated this week
joey-zhou / xiaozhi-concurrent
View on GitHub
本项目是为 Xiaozhi ESP32 Server Java 开发的 WebSocket 并发测试工具，用于测试 Xiaozhi 服务的性能和稳定性。
☆17Oct 4, 2025Updated 9 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
xinnan-tech / voiceprint-api
View on GitHub
基于3D-Speaker的声纹识别API服务。用于识别小智设备说话人。
☆123Jul 16, 2025Updated last year
volcengine / rtc-aigc-demo
View on GitHub
RTC AIGC Demo
☆290Jul 13, 2026Updated last week
FireRedTeam / FireRedASR
View on GitHub
Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR be…
☆1,940Feb 25, 2026Updated 5 months ago
QwenAudio / CosyVoice
View on GitHub
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
☆22,400May 25, 2026Updated 2 months ago
wwbin2017 / bailing
View on GitHub
百聆是一个类似GPT-4o的语音对话机器人，通过ASR+LLM+TTS实现，集成DeepSeek R1等优秀大模型，接入openClaw，真正的个人语音助手，时延低至800ms，Mac等低配置也可运行，支持打断
☆1,742Apr 6, 2026Updated 3 months ago
pengzhendong / streaming-sensevoice
View on GitHub
Pseudo Streaming SenseVoice with Hotwords
☆467Jun 15, 2026Updated last month
QwenAudio / SenseVoice
View on GitHub
Open-source SenseVoiceSmall model for Mandarin, Cantonese, English, Japanese, and Korean ASR, language ID, emotion recognition, and audio…
☆8,935Updated this week
tzyll / ChineseHP
View on GitHub
Dataset for Pinyin Regularization in Error Correction for Chinese Speech Recognition with Large Language Models in Interspeech 2024.
☆16Jul 4, 2024Updated 2 years ago
pengzhendong / speaker-diarization
View on GitHub
Offline Speaker Diarization with SenseVoice by Sherpa ONNX.
☆15Dec 23, 2024Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
xldistance / index-tts-2.5-perfect-webui
View on GitHub
☆19Feb 9, 2026Updated 5 months ago
xinnan-tech / xiaozhi-esp32-server
View on GitHub
本项目为xiaozhi-esp32提供后端服务，帮助您快速搭建ESP32设备控制服务器。Backend service for xiaozhi-esp32, helps you quickly build an ESP32 device control server.
☆10,133Updated this week
JFalnes / Skribify
View on GitHub
Skribify is a powerful transcription and summarization tool that leverages the power of OpenAI's GPT-4 and WhisperAI to generate concise …
☆12Apr 29, 2025Updated last year
Henry-23 / VideoChat
View on GitHub
实时交互数字人，可自定义形象与音色，支持音色克隆，对话延迟低至3s。Real-time voice interactive digital human, customizable appearance and voice, supporting voice cloning,…
☆1,296Dec 18, 2025Updated 7 months ago
ruzhila / voiceapi
View on GitHub
Streaming ASR and TTS based on FastAPI+ sherpa-onnx
☆222Nov 2, 2025Updated 8 months ago
XuSenfeng / xiaozhi-server-vision
View on GitHub
小智的视觉对话
☆34Apr 25, 2025Updated last year
QwenAudio / Fun-ASR
View on GitHub
Open-source LLM-based ASR model family for Chinese, dialect, accent, and multilingual speech, with FunASR, vLLM, streaming, and llama.cpp…
☆1,428Updated this week
modelscope / ClearerVoice-Studio
View on GitHub
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Spe…
☆4,330Aug 14, 2025Updated 11 months ago
QwenLM / Qwen-Audio
View on GitHub
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
☆1,921Jul 5, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
fireredchat-submodules / livekit-plugins-fireredchat-pvad
View on GitHub
FireRedChat pVAD plugin for LiveKit Agents
☆22Sep 16, 2025Updated 10 months ago
DakeQQ / Audio-Denoiser-ONNX
View on GitHub
Utilizes ONNX Runtime for audio denoising.
☆134Updated this week
xphh / fireredasr-streaming
View on GitHub
low-latency realtime ASR based on FireRedASR
☆62Jul 8, 2025Updated last year
lighttransport / VisemeNet-infer
View on GitHub
CPU inference version of VisemeNet-tensorflow
☆14Nov 6, 2019Updated 6 years ago
Mddct / cosyvoice2-flow-optimized
View on GitHub
faster inference
☆27Jan 20, 2025Updated last year
78 / esp-wifi-connect
View on GitHub
ESP32 component helps connect WiFi
☆92Jul 15, 2026Updated last week
lovemefan / fsmn-vad
View on GitHub
A enterprise-grade Voice Activity Detector from modelscope and funasr.
☆139Apr 26, 2023Updated 3 years ago
lipku / LiveTalking
View on GitHub
Real time interactive streaming digital human
☆8,499Jul 19, 2026Updated last week
yuekaizhang / minutes
View on GitHub
Podcast Summarizer with LLM Technology
☆30May 28, 2025Updated last year
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
0nutation / SpeechGPT2.github.io
View on GitHub
☆12Jul 23, 2024Updated 2 years ago
TEN-framework / ten-framework
View on GitHub
Open-source framework for conversational voice AI agents
☆10,966Updated this week
TEN-framework / ten-turn-detection
View on GitHub
Turn detection for full-duplex dialogue communication
☆595Dec 26, 2025Updated 7 months ago
bluishfish / llavaprompt
View on GitHub
A simple "Be My Eyes" web app with a llama.cpp/llava backend
☆15Jan 25, 2024Updated 2 years ago
text-gen / awesome-tg-package
View on GitHub
awesome templates for textgenertor obsidian plugin
☆10Nov 28, 2023Updated 2 years ago
TEN-framework / ten-vad
View on GitHub
Voice Activity Detector (VAD) : low-latency, high-performance and lightweight
☆2,204Feb 2, 2026Updated 5 months ago
alibaba / spring-ai-alibaba
View on GitHub
Agentic AI Framework for Java Developers
☆10,437Updated this week