samsad35 / VQ-MAE-S-codeLinks

[ICASSPW] A Vector Quantized Masked AutoEncoder for speech emotion recognition

☆29

Alternatives and similar repositories for VQ-MAE-S-code

Users that are interested in VQ-MAE-S-code are comparing it to the libraries listed below

Sorting:

cwx-worst-one / EAT
[IJCAI 2024] EAT: Self-Supervised Pre-Training with Efficient Audio Transformer
☆194Updated 5 months ago
thuhcsi / SECap
☆173Updated last year
HappyColor / SpeechFormer
Official implement of SpeechFormer written in Python (PyTorch).
☆80Updated 2 years ago
GalaxyCong / StyleDubber
[ACL 2024] This is the Pytorch code for our paper "StyleDubber: Towards Multi-Scale Style Learning for Movie Dubbing"
☆94Updated last year
Ming-er / MGA-CLAP
official implementation of MGA-CLAP (ACM MM 2024)
☆24Updated last year
choijeongsoo / av2av
[CVPR 2024] AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation
☆43Updated last year
01Zhangbw / Speech-and-audio-papers-Top-Conference
☆112Updated 6 months ago
HappyColor / Vesper
A Compact and Effective Pretrained Model for Speech Emotion Recognition
☆49Updated last year
HappyColor / SpeechFormer2
SpeechFormer++ in PyTorch
☆49Updated 2 years ago
EMOsuperb / EMO-SUPERB-submission
EMO-SUPERB submission
☆48Updated last month
zszheng147 / Spatial-AST
🦇 Encoder of BAT (Learning to Reason about Spatial Sounds with Large Language Models)
☆66Updated 9 months ago
thuhcsi / SpeechCraft
The official repository of SpeechCraft dataset, a large-scale expressive bilingual speech dataset with natural language descriptions.
☆173Updated 7 months ago
roger-tseng / av-superb
A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models (ICASSP 2024)
☆58Updated last year
ECNU-Cross-Innovation-Lab / ENT
[ICASSP 2024] Emotion Neural Transducer for Fine-Grained Speech Emotion Recognition
☆25Updated last year
scutcsq / DWFormer
DWFormer: Dynamic Window Transformer for Speech Emotion Recognition(ICASSP 2023 Oral)
☆66Updated last year
YYX666660 / LAVSS
Code for LAVSS: Location-Guided Audio-Visual Spatial Audio Separation
☆16Updated 9 months ago
MorenoLaQuatra / audiocaps-download
This package aims at simplifying the download of the AudioCaps dataset.
☆36Updated last year
GalaxyCong / HPMDubbing
[CVPR 2023] Official code for paper: Learning to Dub Movies via Hierarchical Prosody Models.
☆110Updated last year
choijeongsoo / lip2speech-unit
[Interspeech 2023] Intelligible Lip-to-Speech Synthesis with Speech Units
☆47Updated last year
SiavashShams / ssamba
[SLT'24] The official implementation of SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model
☆131Updated 3 weeks ago
Levent9 / Zero-shot-FaceVC
☆19Updated last year
JishengBai / AudioSetCaps
A 6-million Audio-Caption Paired Dataset Built with a LLMs and ALMs-based Automatic Pipeline
☆191Updated 11 months ago
kaistmm / Audio-Mamba-AuM
Official Implementation of the work "Audio Mamba: Bidirectional State Space Model for Audio Representation Learning"
☆162Updated last year
imxtx / awesome-controllable-speech-synthesis
This is an evolving repo for the paper "Towards Controllable Speech Synthesis in the Era of Large Language Models: A Systematic Survey".
☆192Updated last week
ahaliassos / raven
Official implementation of RAVEn (ICLR 2023) and BRAVEn (ICASSP 2024)
☆77Updated 9 months ago
b04901014 / FT-w2v2-ser
Official implementation for the paper Exploring Wav2vec 2.0 fine-tuning for improved speech emotion recognition
☆152Updated 4 years ago
usc-sail / peft-ser
[ACII 2023] PEFT-SER: On the Use of Parameter Efficient Transfer Learning Approaches For Speech Emotion Recognition Using Pre-trained Spe…
☆60Updated last year
chenqi008 / V2C
Pytorch implementation for “V2C: Visual Voice Cloning”
☆32Updated 2 years ago
joannahong / AV-RelScore
Audio-Visual Corruption Modeling of our paper "Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling an…
☆35Updated 2 years ago
zxzhao0 / C2SER
We propose C2SER, a novel audio-language model designed to enhance the stability and accuracy of speech emotion recognition through conte…
☆39Updated 8 months ago