Linyx1125 / MM-F2FLinks
[ACL 2025] Predicting Turn-Taking and Backchannel in Human-Machine Conversations Using Linguistic, Acoustic, and Visual Signals
☆19Updated 2 months ago
Alternatives and similar repositories for MM-F2F
Users that are interested in MM-F2F are comparing it to the libraries listed below
Sorting:
- ☆19Updated 7 months ago
- Audio tokenization, in the fastest way possible!☆53Updated last year
- This is a fork of the original fairseq repository (version 0.12.2) with added classes for training mHuBERT-147.☆19Updated 11 months ago
- Open TTS models, built for streaming on the edge☆43Updated 7 months ago
- Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jax☆15Updated last year
- ☆62Updated last year
- StyleTTS 2 Optimized Training Fork☆33Updated 8 months ago
- EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs☆38Updated last month
- ☆39Updated 3 months ago
- The demo page of UniAudio☆34Updated last year
- ☆57Updated last year
- audiolm-pytorch training code☆15Updated 2 years ago
- This repository contains the code and data for the paper EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control by Haozhe Chen,…☆80Updated last year
- PyTorch implementation of NEUTART, a system that creates photorealistic talking avatars from an input text transcription.☆34Updated 7 months ago
- GPT for FACodec☆13Updated last year
- An official implementation of Style-Talker for Spoken Dialogue Generation☆23Updated 9 months ago
- The YouTube Text-To-Speech dataset is comprised of waveform audio extracted from YouTube videos alongside their English transcriptions☆51Updated 4 years ago
- A collection of all our phonemeizers for dataset construction and inference☆26Updated 7 months ago
- A collection of optimized utilities for text-to-audio processing, enhancing both training and inference workflows. This repository contai…☆39Updated 6 months ago
- LongCat Audio Tokenizer and Detokenizer☆112Updated this week
- This is not remotely close to a finished product, and does not intend to nor does this claim to be working fine-tuning code for MaskGCT. …☆12Updated 10 months ago
- Training code and dataset cleasing with Sidon☆36Updated last week
- SpeechGLUE is a speech version of the GLUE benchmark, driven by text-to-speech.☆13Updated 2 years ago
- Anim-400K: A dataset designed from the ground up for automated dubbing of video☆108Updated last year
- [NCMMSC'2024] Emotion-Aware Prosodic Phrasing for Expressive Text-to-Speech☆22Updated last year
- Code associated with the paper: CTC-DRO: Robust Optimization for Reducing Language Disparities in Speech Recognition.☆15Updated 5 months ago
- Provide Gradio custom components to make the diarization-based audio labeling process easier and faster.☆68Updated last month
- My vocoder experiments☆31Updated 2 months ago
- KATube is a tool to automate the process of creating datasets for training Text-To-Speech (TTS) and Speech-To-Text (STT) models. From a l…☆23Updated last year
- An AR+AR TTS attempt.☆18Updated 9 months ago