PiAPI-1 / Moshi-API
☆9Updated 9 months ago
Alternatives and similar repositories for Moshi-API:
Users that are interested in Moshi-API are comparing it to the libraries listed below
- ☆20Updated 6 months ago
- Text-To-Speech for NotebookLM☆29Updated 4 months ago
- torchprime is a reference model implementation for PyTorch on TPU.☆15Updated this week
- Wenet speech to text for react native☆10Updated 2 years ago
- We introduce the LLAMA1 Test Set, a comprehensive open-domain world knowledge QA dataset for evaluating question-answering systems. We pr…☆18Updated last year
- (WIP)long form speech generatoins☆31Updated 3 weeks ago
- noise reduction☆17Updated 9 months ago
- Speech-To-Text forced-alignment Speech processing Universal PERformance Benchmark☆27Updated 9 months ago
- Simple voice activity detection (VAD) algorithm in Python☆12Updated last year
- silero-vad pytorch implement☆17Updated 5 months ago
- CTC decoder with hotwords for ASR.☆18Updated last week
- Forced alignment decoder for Whisper.☆14Updated last year
- Streaming Text to Speech Web UI☆18Updated 11 months ago
- ☆26Updated 2 months ago
- High quality text-to-speech based on StyleTTS 2.☆36Updated this week
- ☆27Updated this week
- faster inference☆28Updated 3 months ago
- Implementation of Google's USM speech model in Pytorch☆31Updated 2 weeks ago
- Open TTS models, built for streaming on the edge☆39Updated last month
- A Comprehensive Mandarin Speech Dataset for Young Children Aged 3-5☆27Updated last month
- Offline Speaker Diarization with SenseVoice by Sherpa ONNX.☆12Updated 4 months ago
- Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)☆65Updated 5 months ago
- Target speaker automatic speech recognition (TS-ASR)☆11Updated last year
- Official Code for ParrotTTS☆48Updated 6 months ago
- An evaluation set for large-scale trained TTS models (Coming in Sep 2024)☆12Updated 7 months ago
- CML-TTS: A Multilingual Dataset for Speech Synthesis☆31Updated 8 months ago
- ☆24Updated 3 months ago
- (R&D) Text to speech using phonemes as inputs and audio codec codes as outputs. Loosely based on MegaByte, VALL-E and Encodec.☆48Updated last year
- GPT-style network for phonemization with durations of text☆64Updated last year
- ☆18Updated 5 months ago