cofe-ai / flm-audioLinks
FLM-Audio is a audio-language subversion of RoboEgo/FLM-Ego -- an omnimodal model with native full duplexity.
☆51Updated this week
Alternatives and similar repositories for flm-audio
Users that are interested in flm-audio are comparing it to the libraries listed below
Sorting:
- ☆40Updated 4 months ago
- EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs☆42Updated 2 months ago
- OpenS2S : Advancing Fully Open-Source End-to-End Empathetic Large Speech Language Model☆96Updated 4 months ago
- ☆70Updated 2 months ago
- LongCat Audio Tokenizer and Detokenizer☆260Updated last week
- A Foundation Model for Industrial Signal Comprehensive Representation☆54Updated 4 months ago
- Towards Fine-grained Audio Captioning with Multimodal Contextual Cues☆85Updated 2 months ago
- [NeurIPS' 25] Benchmark for evaluating TTS models on complex prosodic, expressiveness, and linguistic challenges.☆179Updated this week
- A unified tokenizer that is capable of both extracting semantic information and enabling high-fidelity audio reconstruction.☆123Updated 2 months ago
- a text-conditional diffusion probabilistic model capable of generating high fidelity audio.☆182Updated last year
- This project is to train an RWKV LLM for TTS generation which compatible to other TTS engine(like fish/cosy/chattts).☆91Updated 2 months ago
- ☆87Updated last month
- Official code for "EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting"☆101Updated last month
- ☆15Updated last year
- This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaD…☆194Updated 2 months ago
- ☆37Updated 8 months ago
- Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis☆39Updated 2 years ago
- ☆109Updated last month
- Di♪♪Rhythm 2: Efficient And High Fidelity Song Generation Via Block Flow Matching☆127Updated last month
- [AAAI 2026] DIFFA: Large Language Diffusion Models Can Listen and Understand☆37Updated last month
- A collection of optimized utilities for text-to-audio processing, enhancing both training and inference workflows. This repository contai…☆41Updated 8 months ago
- Code for ACL 2024 main conference paper "Can We Achieve High-quality Direct Speech-to-Speech Translation Without Parallel Speech Data?".☆25Updated last year
- Code for the blog "Neural audio codecs: how to get audio into LLMs"☆138Updated last month
- trying to reproduce suno v3☆35Updated 10 months ago
- [NCMMSC'2024] Emotion-Aware Prosodic Phrasing for Expressive Text-to-Speech☆22Updated last year
- Official repository of the IEEE SLT 2024 paper "Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT"☆44Updated 2 months ago
- Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos☆25Updated last year
- ☆41Updated 10 months ago
- A fast speech-to-speech & speech-to-text translation model that supports simultaneous decoding and offers 28× speedup.☆77Updated last year
- [ACL 2024] Generative Pre-Trained Speech Language Model with Efficient Hierarchical Transformer☆66Updated last year