Linyx1125 / MM-F2FLinks
[ACL 2025] Predicting Turn-Taking and Backchannel in Human-Machine Conversations Using Linguistic, Acoustic, and Visual Signals
☆18Updated last month
Alternatives and similar repositories for MM-F2F
Users that are interested in MM-F2F are comparing it to the libraries listed below
Sorting:
- Audio tokenization, in the fastest way possible!☆53Updated last year
- ☆19Updated 6 months ago
- ☆57Updated last year
- Open TTS models, built for streaming on the edge☆42Updated 6 months ago
- The YouTube Text-To-Speech dataset is comprised of waveform audio extracted from YouTube videos alongside their English transcriptions☆51Updated 4 years ago
- GPT for FACodec☆13Updated last year
- [NCMMSC'2024] Emotion-Aware Prosodic Phrasing for Expressive Text-to-Speech☆22Updated last year
- ☆62Updated last year
- Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jax☆15Updated last year
- Code associated with the paper: CTC-DRO: Robust Optimization for Reducing Language Disparities in Speech Recognition.☆15Updated 4 months ago
- Official Code for ParrotTTS☆55Updated 11 months ago
- This is a fork of the original fairseq repository (version 0.12.2) with added classes for training mHuBERT-147.☆18Updated 10 months ago
- StyleTTS 2 Optimized Training Fork☆33Updated 7 months ago
- An official implementation of Style-Talker for Spoken Dialogue Generation☆22Updated 8 months ago
- ☆38Updated 2 months ago
- This repository contains the code and data for the paper EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control by Haozhe Chen,…☆78Updated 11 months ago
- A collection of all our phonemeizers for dataset construction and inference☆26Updated 7 months ago
- GPT-style network for phonemization with durations of text☆67Updated last year
- The demo page of UniAudio☆34Updated last year
- Official code for "F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization"☆117Updated 3 months ago
- Collection of scripts from mHuBERT-147.☆29Updated 10 months ago
- Towards Fine-grained Audio Captioning with Multimodal Contextual Cues☆80Updated 3 months ago
- My vocoder experiments☆31Updated last month
- Pushing the Limits of Zero-shot End-to-End Speech Translation☆26Updated 9 months ago
- ☆37Updated last year
- 🎙️ Automatically transcribe audio/video into high-quality, speaker-specific Text-To-Speech datasets ✨☆123Updated last month
- An unofficial PyTorch implementation of VALL-E☆88Updated last month
- Official repository of the IEEE SLT 2024 paper "Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT"☆42Updated this week
- ☆144Updated 3 weeks ago
- Putting flows on top of neural transducers for better TTS☆64Updated last month