ScenemaAI/scenema-audio

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ScenemaAI/scenema-audio)

ScenemaAI / scenema-audio

Zero-shot expressive voice cloning and speech generation. Generate anything from short clips to full-length audiobooks with realistic emotional delivery, pacing, and breath control. Clone any voice from a 10-second reference and perform emotions the original speaker never recorded.

☆534

Alternatives and similar repositories for scenema-audio

Users that are interested in scenema-audio are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

resemble-ai / DramaBox
View on GitHub
super expressive prompting model based on ltx2.3
☆468May 23, 2026Updated last month
Shao-Music-AI / Shao
View on GitHub
☆376Jul 13, 2026Updated last week
wildminder / awesome-ai-voice
View on GitHub
List of open-source TTS, voice cloning, and music generation models
☆388Updated this week
HKUST-LongGroup / SwiftI2V
View on GitHub
[arXiv 2026] Project page for paper "SwiftI2V: Efficient High-Resolution Image-to-Video Generation via Conditional Segment-wise Generatio…
☆85May 8, 2026Updated 2 months ago
GVCLab / PersonaLive
View on GitHub
[CVPR 2026] PersonaLive! : Expressive Portrait Image Animation for Live Streaming
☆3,415May 15, 2026Updated 2 months ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
debpalash / OmniVoice-Studio
View on GitHub
Local voice clone, video dubbing, dictation and audiobook maker. The open-source ElevenLabs alternative.
☆8,817Updated this week
sunnyxrxrx / X-Voice
View on GitHub
X-Voice
☆176Jun 5, 2026Updated last month
ybouane / VideoFlow
View on GitHub
Programmatic video for the web. Define videos with a fluent TypeScript API, compile them to a portable JSON format, and render to MP4 — i…
☆123Jun 19, 2026Updated last month
WhatDreamsCost / WhatDreamsCost-ComfyUI
View on GitHub
LTX Director and a variety of other custom ComfyUI nodes and workflows
☆1,797Jul 15, 2026Updated last week
supertone-inc / supertonic
View on GitHub
Lightning-Fast, On-Device, Multilingual TTS — running natively via ONNX.
☆13,469Jun 30, 2026Updated 3 weeks ago
ASLP-lab / FlashTTS
View on GitHub
Fast Streaming TTS with MTP Acceleration and X-pred Mean Flow Distillation
☆63Jun 16, 2026Updated last month
xzf-thu / Mega-ASR
View on GitHub
First foundation ASR built for the real world - 7 atomic acoustic conditions, 54 compound scenarios, 2.6M samples, and up to ~30% gains o…
☆1,081Jun 2, 2026Updated last month
cwx-worst-one / WavTTS
View on GitHub
WavTTS: Towards High-Quality Zero-Shot TTS via Direct Raw Waveform Modeling
☆209Jun 6, 2026Updated last month
studio-dots-ai / dots.tts
View on GitHub
☆945Jul 10, 2026Updated last week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
OpenMOSS / MOSS-TTS
View on GitHub
MOSS‑TTS Family is an open‑source speech and sound generation model family from MOSI.AI and the OpenMOSS team. It is designed for high‑fi…
☆3,853Jun 22, 2026Updated 3 weeks ago
ID-LoRA / ID-LoRA
View on GitHub
[ECCV 2026] Generate high resolution videos with a custom voice and appearance, based on LTX-2/LTX-2.3 + Identity In-Context LoRA
☆347Jun 24, 2026Updated 3 weeks ago
k2-fsa / OmniVoice
View on GitHub
High-Quality Voice Cloning TTS for 600+ Languages
☆8,391Updated this week
Francis-Rings / FlashPortrait
View on GitHub
[CVPR2026]We present FlashPortrait, an end-to-end video diffusion transformer capable of synthesizing ID-preserving, infinite-length vide…
☆479Feb 21, 2026Updated 5 months ago
HeCheng0625 / Diffusion-Speech-Tokenizer
View on GitHub
This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaD…
☆198Jan 25, 2026Updated 5 months ago
inclusionAI / Ming-omni-tts
View on GitHub
Ming-omni-tts: Simple and Efficient Unified Generation of Speech, Music, and Sound with Precise Control
☆263Feb 26, 2026Updated 4 months ago
HumeAI / tada
View on GitHub
Open Source Speech Language Model
☆1,007May 11, 2026Updated 2 months ago
harnexa / nexa-gauge
View on GitHub
An graph-eval framework for LLM's
☆40Updated this week
xiquan-li / Resonate
View on GitHub
[INTERSPEECH 2026] Pre-training, SFT, DPO and GRPO for Text-to-Audio Generation
☆48Apr 17, 2026Updated 3 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
kandinskylab / kvae-audio
View on GitHub
KVAE-Audio: a continuous full-band audio waveform autoencoder
☆99Jun 30, 2026Updated 3 weeks ago
ysharma3501 / LuxTTS
View on GitHub
A high-quality rapid TTS voice cloning model that reaches speeds of 150x realtime.
☆4,838Jun 5, 2026Updated last month
TencentYoutuResearch / T2I-L2P
View on GitHub
Code for "L2P: Unlocking Latent Potential for Pixel Generation"
☆179Jul 11, 2026Updated last week
ysharma3501 / LavaSR
View on GitHub
🌋LavaSR: Fast Speech restoration and enhancement
☆563Jun 19, 2026Updated last month
EasonXiao-888 / SpatialEdit
View on GitHub
[Official Repo] SpatialEdit: Benchmarking Fine-Grained Image Spatial Editing
☆214Apr 13, 2026Updated 3 months ago
xiaomi-research / dasheng-audiogen
View on GitHub
end-to-end text to audio scene generation model
☆50Jun 16, 2026Updated last month
wsntxxn / UniFlow-Audio
View on GitHub
☆72Updated this week
TencentARC / Pixal3D
View on GitHub
[SIGGRAPH 2026] Pixal3D: Pixel-Aligned 3D Generation from Images
☆2,006Jun 23, 2026Updated 3 weeks ago
Scicom-AI-Enterprise-Organization / Multilingual-TTS
View on GitHub
Building actual open source including dataset Multilingual TTS more than 150 languages with Voice Cloning.
☆54Jul 14, 2026Updated last week
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
Kugelaudio / kugelaudio-open
View on GitHub
Open-source text-to-speech for European languages with voice cloning
☆267Feb 6, 2026Updated 5 months ago
meituan-longcat / LongCat-Video
View on GitHub
☆5,270May 27, 2026Updated last month
ZeyueT / AudioX
View on GitHub
[ICLR 2026] Repository of AudioX
☆1,542Mar 10, 2026Updated 4 months ago
facebookresearch / WavFlow
View on GitHub
MultiModal Audio Generation in Raw Waveform Space.
☆154May 26, 2026Updated last month
abus-aikorea / voice-pro
View on GitHub
Gradio WebUI for creators and developers, featuring key TTS (Edge-TTS, kokoro) and zero-shot Voice Cloning (E2 & F5-TTS, CosyVoice), with…
☆11,189Jul 13, 2026Updated last week
jordandare / echo-tts
View on GitHub
Echo-TTS inference codebase
☆204Dec 5, 2025Updated 7 months ago
Aratako / MioTTS-Inference
View on GitHub
Inference server for MioTTS, a lightweight and fast LLM-based TTS model.
☆197Feb 14, 2026Updated 5 months ago