yukiarimo / hanasuLinks
Hanasu is a human-like TTS model based on the multilingual Himitsu V1 transformer-based encoder and VITS architecture
☆30Updated 2 weeks ago
Alternatives and similar repositories for hanasu
Users that are interested in hanasu are comparing it to the libraries listed below
Sorting:
- Streaming and Fine-tuning for Chatterbox TTS☆109Updated last week
- A random walk voice style cloning application for Kokoro text to speech☆99Updated last week
- ☆22Updated this week
- Implementation of Sesame's Conversational Speech Model for Hugging Face Transformers☆56Updated last month
- 🎙️ Automatically transcribe audio/video into high-quality, speaker-specific Text-To-Speech datasets ✨☆90Updated last month
- Run Orpheus 3B Locally with Gradio UI, Standalone App☆22Updated 2 months ago
- Adding a multi-text multi-speaker script (diffe) that is based on a script from asiff00 on issue 61 for Sesame: A Conversational Speech G…☆23Updated 2 months ago
- Dia-JAX: A JAX port of Dia, the text-to-speech model for generating realistic dialogue from text with emotion and tone control.☆27Updated last month
- ☆78Updated this week
- Yet Another (LLM) Web UI, made with Gemini☆12Updated 6 months ago
- LLM backed Fantasy Tribe Game☆18Updated 7 months ago
- 🗣️ Real‑time, low‑latency voice, vision, and conversational‑memory AI assistant built on LiveKit and local LLMs ✨☆55Updated 3 weeks ago
- Chatbot-to-speech using Orpheus TTS model. Interactive console app.☆17Updated last month
- Real-time processing and delivery of sentences from a continuous stream of characters or text chunks.☆63Updated last week
- Automated speech dataset creator☆152Updated last week
- Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS (E2 TTS) in MLX☆27Updated 8 months ago
- This repo provides a simple Gradio UI to run Qwen2 VL 72B AWQ in venv and have both image and video inferencing work.☆30Updated 8 months ago
- SLOP Detector and analyzer based on dictionary for shareGPT JSON and text☆70Updated 7 months ago
- ☆97Updated last year
- Super simple python connectors for llama.cpp, including vision models (Gemma 3, Qwen2-VL). Compile llama.cpp and run!☆25Updated last month
- Whisper Speaker Identification (WSI), a cutting-edge model for multilingual speaker identification.☆18Updated 3 months ago
- Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…☆67Updated last week
- Open TTS models, built for streaming on the edge☆43Updated 3 months ago
- BUD-E (Buddy) is an open-source voice assistant framework that facilitates seamless interaction with AI models and APIs, enabling the cre…☆36Updated 11 months ago
- ☆49Updated 4 months ago
- 1 min voice data can also be used to train a good TTS model! (few shot voice cloning)☆26Updated 2 weeks ago
- Orpheus Chat WebUI☆65Updated 2 months ago
- Course Project for COMP4471 on RWKV☆17Updated last year
- An API for VoiceCraft.☆25Updated 11 months ago
- Think of it as giving your AI a searchable diary and knowledge base that grows with every conversation.☆16Updated last month