We propose C2SER, a novel audio-language model designed to enhance the stability and accuracy of speech emotion recognition through contextual perception and chain of Thought (CoT).
☆17Mar 3, 2025Updated last year
Alternatives and similar repositories for C2SER
Users that are interested in C2SER are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement☆46Mar 10, 2025Updated last year
- Llasa Speed Up☆63Jan 18, 2026Updated 4 months ago
- wenet_LLM_from_ASLP☆15Nov 26, 2024Updated last year
- A Massive Contextual Speech Recognition Benchmark.☆105Aug 6, 2025Updated 9 months ago
- OSUM & OSUM-EChat, open speech understanding model and empathetic spoken chatbot based on it, open-sourced by ASLP@NPU.☆492Nov 23, 2025Updated 5 months ago
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- ☆40Sep 25, 2025Updated 7 months ago
- Official repository for the WenetSpeech-Chuan dataset.☆185Feb 5, 2026Updated 3 months ago
- A song aesthetic evaluation toolkit trained on SongEval.☆303Apr 8, 2026Updated last month
- LLaSE: Maximizing Acoustic Preservation for LLaMA based Speech Enhancement☆16Jul 11, 2025Updated 10 months ago
- ☆16Mar 12, 2024Updated 2 years ago
- This is a list of speech tasks and datasets, which can provide training data for Generative AI, AIGC, AI model training, intelligent spee…☆82Jun 7, 2024Updated last year
- ☆13Jun 8, 2024Updated last year
- Inference code for Audiodec-Valle-Wenetspeech4TTS☆51Jul 14, 2024Updated last year
- ☆34Sep 15, 2025Updated 8 months ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- A Large-scale Cantonese Speech Corpus with Multi-dimensional Annotation☆304Feb 5, 2026Updated 3 months ago
- ☆24Jul 10, 2025Updated 10 months ago
- This is the official implementation of PGUSE☆40Jun 7, 2025Updated 11 months ago
- Blazing fast data loading with HuggingFace Dataset and Ray Data☆16Jan 12, 2024Updated 2 years ago
- TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages☆19May 23, 2024Updated last year
- An instruct text-to-speech solution based on LLaSA and CosyVoice2 developed by the ASLP lab and collaborators.☆242Feb 26, 2026Updated 2 months ago
- A Lightweight and Streaming Zero-Shot Voice Conversion via Mean Flows☆271Jan 8, 2026Updated 4 months ago
- This is the repository for the work "BridgeVoC: Revitalizing Neural Vocoder from a Restoration Perspective".☆64Nov 5, 2025Updated 6 months ago
- ☆60Jul 5, 2025Updated 10 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Official code of SenSE.☆84Oct 30, 2025Updated 6 months ago
- The official implementation of V-AURA: Temporally Aligned Audio for Video with Autoregression (ICASSP 2025) (Oral)☆33Feb 11, 2026Updated 3 months ago
- Random Tips and Writeups.☆15Feb 21, 2019Updated 7 years ago
- Di♪♪Rhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion☆2,295Nov 27, 2025Updated 5 months ago
- Linux内核学习——心中的内核☆18Jun 24, 2025Updated 10 months ago
- ☆43Feb 8, 2025Updated last year
- ☆16Sep 12, 2023Updated 2 years ago
- An interactive TUI for visualizing code statistics from tokei.☆37May 2, 2026Updated 2 weeks ago
- A Large-scale Wu Dialect Speech Corpus with Multi-dimensional Annotations☆143Feb 6, 2026Updated 3 months ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- 🐧 Ucanto UCAN RPC in Go☆13Mar 18, 2026Updated 2 months ago
- Dhruva is an open-source platform for serving language AI models at scale.☆23Aug 25, 2025Updated 8 months ago
- A Diffusion Probabilistic Model for Target Sound Extraction☆40Sep 27, 2024Updated last year
- C++ version of ailia models repository☆25Updated this week
- LibAFLGo: Evaluating and Advancing Directed Greybox Fuzzing☆26Mar 4, 2026Updated 2 months ago
- We propose C2SER, a novel audio-language model designed to enhance the stability and accuracy of speech emotion recognition through conte…☆47Mar 3, 2025Updated last year
- [ICLR 2024] This is the official implementation for the paper: "Beyond imitation: Leveraging fine-grained quality signals for alignment"☆10May 5, 2024Updated 2 years ago