zxzhao0/C2SER

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/zxzhao0/C2SER)

zxzhao0 / C2SER

We propose C2SER, a novel audio-language model designed to enhance the stability and accuracy of speech emotion recognition through contextual perception and chain of Thought (CoT).

☆49

Alternatives and similar repositories for C2SER

Users that are interested in C2SER are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

backspacetg / distilXLSR
View on GitHub
Models and codes for INTERSPEECH 2023 paper DistilXLSR: A Light Weight Cross-Lingual Speech Representation Model
☆13Mar 30, 2025Updated last year
Kevin-naticl / LLaSE
View on GitHub
LLaSE: Maximizing Acoustic Preservation for LLaMA based Speech Enhancement
☆16Jul 11, 2025Updated last year
lmxue / Audio-FLAN
View on GitHub
Audio-FLAN
☆161Sep 23, 2025Updated 10 months ago
ECNU-Cross-Innovation-Lab / ENT
View on GitHub
[ICASSP 2024] Emotion Neural Transducer for Fine-Grained Speech Emotion Recognition
☆28Apr 11, 2024Updated 2 years ago
Jiaxin-Ye / Emo-DNA
View on GitHub
[ACM MM 2023] Official PyTorch implementation of "Emo-DNA: Emotion Decoupling and Alignment Learning for Cross-Corpus Speech Emotion Reco…
☆12Aug 4, 2023Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
qualialabsAI / SmoothConv-DuplexConv
View on GitHub
☆83Jun 12, 2026Updated last month
ASLP-lab / Speaker-Reasoner
View on GitHub
Speaker-Reasoner: Scaling Interaction Turns and Reasoning Patterns for Timestamped Speaker-Attributed ASR
☆93May 13, 2026Updated 2 months ago
HappyColor / Vesper
View on GitHub
A Compact and Effective Pretrained Model for Speech Emotion Recognition
☆55Apr 10, 2026Updated 3 months ago
nonverbalspeech38k / nonverspeech38k
View on GitHub
The official repository for the paper “NonVerbalSpeech-38K: A Scalable Pipeline for Enabling Non-Verbal Speech Generation and Understandi…
☆68Dec 26, 2025Updated 7 months ago
ASLP-lab / MSU-Bench
View on GitHub
Open repository of "MSU-Bench: Towards Understanding the Conversational Multi-Speaker Scenarios"
☆18Jul 7, 2026Updated 2 weeks ago
MaikeZuefle / f-actor
View on GitHub
☆28Jul 17, 2026Updated last week
Honee-W / U-SAM
View on GitHub
Official repository for U-SAM (Interspeech 2025)
☆28Jun 3, 2025Updated last year
msplabresearch / MSP-Podcast_Challenge_IS2025
View on GitHub
MSP-Podcast Challenge Baseline Code for Interspeech 2025
☆29Dec 4, 2024Updated last year
emo-box / EmoBox
View on GitHub
[INTERSPEECH 2024] EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark
☆321Mar 18, 2026Updated 4 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
sarulab-speech / DuplexChat
View on GitHub
☆46Jul 5, 2026Updated 3 weeks ago
yoongi43 / VRVQ
View on GitHub
Implementation of the paper "Variable Bitrate Residual Vector Quantization for Audio Coding"
☆11Apr 10, 2025Updated last year
yanghaha0908 / WavCube
View on GitHub
Official code for "WavCube: Unifying Speech Representation for Understanding and Generation via Semantic-Acoustic Joint Modeling"
☆62Jun 27, 2026Updated 3 weeks ago
BayLing-Models / BayLing-Duplex
View on GitHub
Native full-duplex speech dialogue inference for BayLing-Duplex.
☆63Jun 22, 2026Updated last month
ASLP-lab / OSUM
View on GitHub
OSUM & OSUM-EChat, open speech understanding model and empathetic spoken chatbot based on it, open-sourced by ASLP@NPU.
☆495Nov 23, 2025Updated 8 months ago
jishengpeng / WavReward
View on GitHub
WavReward: Spoken Dialogue Models With Generalist Reward Evaluators
☆56May 15, 2025Updated last year
LAION-AI / scaled-echo-tts
View on GitHub
Scaled diffusion transformer for text-to-speech synthesis (DiT + T5Gemma2 conditioning, TorchTitan & Megatron backends, tested up to 1024…
☆24Mar 29, 2026Updated 3 months ago
KeiKinn / ParaCLAP
View on GitHub
Towards a general language-audio model for computational paralinguistic tasks
☆30Dec 14, 2024Updated last year
xingchensong / TouchNet
View on GitHub
A native-PyTorch library for large scale M-LLM (text/audio) training with tp/cp/dp.
☆233Jul 2, 2026Updated 3 weeks ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
Tencent / StableToken
View on GitHub
[ICLR 2026] StableToken: A state-of-the-art noise-robust semantic speech tokenizer featuring Voting-LFQ for resilient SpeechLLMs.
☆33Feb 27, 2026Updated 4 months ago
Soul-AILab / SAC
View on GitHub
[ACL 2026 Main] Training, inference, and testing of the SAC speech codec model.
☆108Nov 1, 2025Updated 8 months ago
inclusionAI / MingTok-Audio
View on GitHub
☆88Feb 24, 2026Updated 5 months ago
FreedomIntelligence / S2S-Arena
View on GitHub
☆21Jun 4, 2026Updated last month
scutcsq / DWFormer
View on GitHub
DWFormer: Dynamic Window Transformer for Speech Emotion Recognition(ICASSP 2023 Oral)
☆69Jul 8, 2024Updated 2 years ago
fluxions-ai / stftvae
View on GitHub
Inference for the STFT-VAE continuous audio codec (24kHz, 3.125Hz latent)
☆43Jul 12, 2026Updated last week
yuhanghe01 / RiTTA
View on GitHub
Event Relation in Text-to-Audio (TTA) Generation
☆21Feb 26, 2025Updated last year
ASLP-lab / SmartGlasses
View on GitHub
This challenge focuses on evaluating speech recognition and semantic understanding capabilities of AI glasses in complex real-world envir…
☆18Jun 27, 2026Updated 3 weeks ago
cuhealthybrains / MT-LLM
View on GitHub
The implementation for "Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions"
☆51Apr 7, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
Mddct / transformer-vocos
View on GitHub
☆35Sep 6, 2025Updated 10 months ago
EMOsuperb / EMO-SUPERB-submission
View on GitHub
EMO-SUPERB submission
☆51Oct 13, 2025Updated 9 months ago
ASLP-lab / LLaSE-G1
View on GitHub
LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement
☆47Mar 10, 2025Updated last year
Choddeok / Affectron
View on GitHub
[ACL 2026 Findings] Affectron: Emotional Speech Synthesis with Affective and Contextually Aligned Nonverbal Vocalizations
☆20Jul 16, 2026Updated last week
P1ping / TokAN-Legacy
View on GitHub
☆27Jun 22, 2026Updated last month
walker-hyf / ECSS
View on GitHub
Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling (Accepted by AAAI'2024)
☆59Jun 20, 2024Updated 2 years ago
walker-hyf / GPT-Talker
View on GitHub
Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)
☆78Nov 1, 2024Updated last year