liu12366262626 / AlignVSRLinks
Visual Speech Recongnition
☆19Updated 9 months ago
Alternatives and similar repositories for AlignVSR
Users that are interested in AlignVSR are comparing it to the libraries listed below
Sorting:
- DUSTED: Spoken-Term Discovery using Discrete Speech Units☆18Updated last year
- Code for InterSpeech 2024 Paper: LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition☆17Updated last year
- (Interspeech 2023 & ICASSP 2024) Official repository for ARMHuBERT and STaRHuBERT☆40Updated last year
- Source code and speech samples for the DSU-AVO paper accepted to INTERSPEECH 2023☆12Updated last year
- [INTERSPEECH 2025 Oral]Official code for "Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment"☆57Updated 3 months ago
- Survey on speech generation work.☆21Updated last year
- Implementation of CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning☆47Updated last year
- ☆60Updated 11 months ago
- ☆29Updated 4 months ago
- [ACMMM'2024] Generative Expressive Conversational Speech Synthesis☆39Updated 11 months ago
- FCTalker: Fine and Coarse Grained Context Modeling for Expressive Conversational Speech Synthesis (Accepted by ISCSLP'2024)☆26Updated last year
- The implementation for "Empowering Whisper as a Joint Multi-Talker and Target-Talker Speech Recognition System".☆30Updated 2 months ago
- ☆28Updated 4 months ago
- Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)☆61Updated 11 months ago
- Multi-Task Speech classification of accent and gender of an english speaker on Mozilla's common voice dataset☆27Updated 4 months ago
- Automatic speech annotator processing speech with voice activaty detection, overlapping speech detection, speaker diarization and automat…☆33Updated last year
- Source code and demo for INTERPSEECH 2023 paper: DuTa-VC: A Duration-aware Typical-to-atypical Voice Conversion Approach with Diffusion P…☆37Updated last year
- WavReward: Spoken Dialogue Models With Generalist Reward Evaluators☆53Updated 4 months ago
- Glow-TTS with Stochastic Duration Predictor and Stochastic Pitch Predictor☆18Updated 2 years ago
- Speaker-aware CTC (SACTC) for multi-talker overlapped speech recognition.☆21Updated 4 months ago
- FREECODEC: A DISENTANGLED NEURAL SPEECH CODEC WITH FEWER TOKENS☆23Updated last year
- Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)☆76Updated 11 months ago
- ☆19Updated last year
- This repository presents an evaluation framework for speech-to-speech (S2S) models, following the methodology described in the EmphAsses …☆24Updated last year
- We propose C2SER, a novel audio-language model designed to enhance the stability and accuracy of speech emotion recognition through conte…☆38Updated 7 months ago
- Streaming Vocos☆29Updated 4 months ago
- [WIP] Unofficial Implementation of Microsoft's PromptTTS2☆52Updated last year
- An unofficial PyTorch implementation of Mix-Phoneme-Bert☆40Updated 2 years ago
- SLMTokBench for paper "SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models"☆37Updated 2 years ago
- This repository contains prompts & best practices to annotate audio clips with a very high degree of details using Audio-Language-Models☆35Updated last year