yukiarimo / hanasuLinks
Hanasu is a human-like TTS model based on the multilingual Himitsu V1 transformer-based encoder and VITS architecture
☆28Updated this week
Alternatives and similar repositories for hanasu
Users that are interested in hanasu are comparing it to the libraries listed below
Sorting:
- A random walk voice style cloning application for Kokoro text to speech☆85Updated last week
- 🎙️ Automatically transcribe audio/video into high-quality, speaker-specific Text-To-Speech datasets ✨☆38Updated 2 weeks ago
- Use the Moondream 2 model to detect faces and their gaze directions in videos.☆40Updated 4 months ago
- Run Orpheus 3B Locally with Gradio UI, Standalone App☆21Updated 2 months ago
- Transplants vocabulary between language models, enabling the creation of draft models for speculative decoding WITHOUT retraining.☆30Updated 2 months ago
- Whisper Speaker Identification (WSI), a cutting-edge model for multilingual speaker identification.☆19Updated 2 months ago
- zero-shot realtime TTS system, fully offline, free and open source☆39Updated last month
- An open source real-time AI inference engine for seamless scaling☆19Updated last week
- ☆20Updated 2 weeks ago
- Implementation of Sesame's Conversational Speech Model for Hugging Face Transformers☆56Updated 2 weeks ago
- A lightweight, efficient variation of the StyleTTS 2 text‐to‐speech model.☆18Updated 2 weeks ago
- Open TTS models, built for streaming on the edge☆43Updated 2 months ago
- 🗣️ Real‑time, low‑latency voice, vision, and conversational‑memory AI assistant built on LiveKit and local LLMs ✨☆35Updated last week
- Dia-JAX: A JAX port of Dia, the text-to-speech model for generating realistic dialogue from text with emotion and tone control.☆27Updated 3 weeks ago
- Yet Another (LLM) Web UI, made with Gemini☆12Updated 5 months ago
- 1 min voice data can also be used to train a good TTS model! (few shot voice cloning)☆26Updated this week
- Babylon.cpp is a C and C++ library for grapheme to phoneme conversion and text to speech synthesis. For phonemization a ONNX runtime port…☆21Updated 9 months ago
- Super simple python connectors for llama.cpp, including vision models (Gemma 3, Qwen2-VL). Compile llama.cpp and run!☆24Updated 3 weeks ago
- Deploy Apollo HF space locally☆40Updated 5 months ago
- Adding a multi-text multi-speaker script (diffe) that is based on a script from asiff00 on issue 61 for Sesame: A Conversational Speech G…☆23Updated 2 months ago
- ☆22Updated 7 months ago
- Text-to-Music Generation with Rectified Flow Transformer☆8Updated 9 months ago
- (WIP) A retrain of F5-TTS on permissively-licensed data☆11Updated last month
- ☆95Updated last year
- ACE-Step: A Step Towards Music Generation Foundation Model☆40Updated 2 weeks ago
- SLOP Detector and analyzer based on dictionary for shareGPT JSON and text☆69Updated 7 months ago
- StyleTTS 2 Optimized Training Fork☆29Updated 4 months ago
- Game Companion AI is an advanced application designed to enhance the gaming experience by providing real-time analysis and interpretation…☆50Updated 8 months ago
- fast state-of-the-art speech models and a runtime that runs anywhere 💥☆55Updated 2 weeks ago
- Lightweight Gradio based WebUI for orpheusTTS - WSL / Linux [CUDA]☆96Updated 2 months ago