elyxlz / voxtralLinks
Voxtral: Convert Mistral into a end2end SpeechLM. No information bottleneck, preserves prosody, learns interruptions from data. Unlike GPT4o (closed) or Moshi (complex), it's open, simple, natural.
☆18Updated 4 months ago
Alternatives and similar repositories for voxtral
Users that are interested in voxtral are comparing it to the libraries listed below
Sorting:
- The demo page of UniAudio☆34Updated last year
- Trying to build an all in one speech-text language model - a bit like GPT-4o☆22Updated last year
- Accompanying repository for the paper "DiffVox: A Differentiable Model for Capturing and Analysing Professional Effects Distributions"☆28Updated this week
- Official code of the paper: Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis.☆46Updated 10 months ago
- Music production for silent film clips.☆26Updated 2 months ago
- ☆22Updated last year
- ☆62Updated 11 months ago
- Pytorch implementation of SoundCTM☆97Updated 3 months ago
- ☆47Updated 8 months ago
- Whisper Speaker Identification (WSI), a cutting-edge model for multilingual speaker identification.☆20Updated 4 months ago
- Towards Fine-grained Audio Captioning with Multimodal Contextual Cues☆75Updated last month
- ☆78Updated 8 months ago
- This is a repository that collects common audio noise reduction models, using Gradio to demonstrate the use of each model, which is very …☆40Updated 7 months ago
- ☆8Updated 11 months ago
- audiolm-pytorch training code☆15Updated last year
- ☆11Updated last year
- ☆21Updated this week
- SonicVerse: Multi-Task Learning for Music Feature-Informed Captioning☆37Updated 3 weeks ago
- ☆107Updated last year
- [NCMMSC'2024] Emotion-Aware Prosodic Phrasing for Expressive Text-to-Speech☆22Updated 10 months ago
- TechSinger: Technique Controllable Multilingual Singing Voice Synthesis via Flow Matching☆63Updated 2 months ago
- Codebase and project page for EDMSound☆34Updated last year
- SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.☆95Updated 6 months ago
- a text-conditional diffusion probabilistic model capable of generating high fidelity audio.☆167Updated last year
- Open TTS models, built for streaming on the edge☆43Updated 4 months ago
- Repository for "TESS-2: A Large-Scale, Generalist Diffusion Language Model"☆41Updated 4 months ago
- ☆20Updated last year
- This repository contains the code and data for the paper EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control by Haozhe Chen,…☆75Updated 9 months ago
- Implementation of Strassen attention, from Kozachinskiy et al. of National Center of AI in Chile☆38Updated last week
- Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor…☆59Updated last year