cofe-ai / flm-audioLinks
FLM-Audio is a audio-language subversion of RoboEgo/FLM-Ego -- an omnimodal model with native full duplexity.
☆43Updated last month
Alternatives and similar repositories for flm-audio
Users that are interested in flm-audio are comparing it to the libraries listed below
Sorting:
- LongCat Audio Tokenizer and Detokenizer☆178Updated last week
- [NeurIPS' 25] Benchmark for evaluating TTS models on complex prosodic, expressiveness, and linguistic challenges.☆172Updated 2 weeks ago
- EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs☆40Updated last month
- OpenS2S : Advancing Fully Open-Source End-to-End Empathetic Large Speech Language Model☆89Updated 3 months ago
- ☆40Updated 3 months ago
- We Speech Toolkit, LLM based Speech Toolkit for Speech Understanding, Generation, and Interaction☆69Updated 2 weeks ago
- Official code for"DiaMoE-TTS: A Unified IPA-based Dialect TTS Framework with Mixture-of-Experts and Parameter-Efficient Zero-Shot Adaptat…☆158Updated last week
- This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaD…☆185Updated last month
- LSLM implements full duplex modeling in interactive speech language models, based on research by Ma et al. (2024). This project advances …☆81Updated 4 months ago
- ☆103Updated last week
- This project is to train an RWKV LLM for TTS generation which compatible to other TTS engine(like fish/cosy/chattts).☆88Updated 3 weeks ago
- Official code for "EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting"☆89Updated 2 weeks ago
- Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)☆76Updated last year
- ☆14Updated last year
- A Foundation Model for Industrial Signal Comprehensive Representation☆47Updated 2 months ago
- A unified tokenizer that is capable of both extracting semantic information and enabling high-fidelity audio reconstruction.☆105Updated last month
- LUCY: Linguistic Understanding and Control Yielding Early Stage of Her☆56Updated 6 months ago
- Towards Fine-grained Audio Captioning with Multimodal Contextual Cues☆81Updated last month
- [ACL 2024] Generative Pre-Trained Speech Language Model with Efficient Hierarchical Transformer☆65Updated last year
- ☆41Updated 8 months ago
- ☆66Updated last month
- Official repository of the IEEE SLT 2024 paper "Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT"☆43Updated last month
- Code for the blog "Neural audio codecs: how to get audio into LLMs"☆110Updated last week
- ☆50Updated 7 months ago
- A trainer for SNAC (Multi-Scale Neural Audio Codec) has replaced the decoder with Vocos.☆58Updated last year
- Code for ACL 2024 main conference paper "Can We Achieve High-quality Direct Speech-to-Speech Translation Without Parallel Speech Data?".☆25Updated last year
- A fast speech-to-speech & speech-to-text translation model that supports simultaneous decoding and offers 28× speedup.☆76Updated last year
- ☆100Updated last month
- Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation☆280Updated 2 weeks ago
- GPT-style network for phonemization with durations of text☆67Updated last year