laitselec / MuFunLinks
☆34Updated 5 months ago
Alternatives and similar repositories for MuFun
Users that are interested in MuFun are comparing it to the libraries listed below
Sorting:
- ☆96Updated 3 months ago
- Make-An-Audio-3: Transforming Text/Video into Audio via Flow-based Large Diffusion Transformers☆118Updated 8 months ago
- Official code for "EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting"☆107Updated 3 months ago
- a text-conditional diffusion probabilistic model capable of generating high fidelity audio.☆188Updated last year
- This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaD…☆197Updated last week
- A unified tokenizer that is capable of both extracting semantic information and enabling high-fidelity audio reconstruction.☆131Updated 4 months ago
- A large-scale speech corpus introduced in Spark-TTS, built from diverse open-source datasets for training text-to-speech (TTS) systems.☆104Updated 8 months ago
- ☆43Updated 5 months ago
- OpenS2S : Advancing Fully Open-Source End-to-End Empathetic Large Speech Language Model☆105Updated 6 months ago
- ☆111Updated 2 months ago
- An neural full-band audio codec for general audio sampled at 48 kHz with 7.5 kps or 4.5 kbps.☆197Updated 6 months ago
- Implementation of Frieren: Efficient Video-to-Audio Generation Network with Rectified Flow Matching (NeurIPS'24)☆59Updated 9 months ago
- LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation (INTERSPEECH 2024)☆43Updated last year
- SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.☆113Updated this week
- ACM MM 2023 CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model☆211Updated last year
- small audio language model for reasoning☆86Updated last month
- Official repo for CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations☆62Updated last year
- Official code for "Semantic-VAE: Semantic-Alignment Latent Representation for Better Speech Synthesis"☆106Updated last month
- ☆43Updated 11 months ago
- ☆49Updated last month
- VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling☆95Updated last year
- Audio-FLAN☆160Updated 4 months ago
- ☆114Updated 3 months ago
- ☆105Updated 4 months ago
- Streamable Text-to-Speech model using a language modeling approach, without vector quantization☆110Updated 8 months ago
- Data Pipeline, Models, and Benchmark for Omni-Captioner.☆115Updated 3 months ago
- ☆151Updated last year
- Text-audio foundation model from Boson AI☆117Updated 4 months ago
- Towards Fine-grained Audio Captioning with Multimodal Contextual Cues☆86Updated 3 weeks ago
- This repository contains the code and data for the paper EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control by Haozhe Chen,…☆81Updated last year