We propose C2SER, a novel audio-language model designed to enhance the stability and accuracy of speech emotion recognition through contextual perception and chain of Thought (CoT).
☆17Mar 3, 2025Updated last year
Alternatives and similar repositories for C2SER
Users that are interested in C2SER are comparing it to the libraries listed below
Sorting:
- LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement☆46Mar 10, 2025Updated last year
- Llasa Speed Up☆61Jan 18, 2026Updated 2 months ago
- wenet_LLM_from_ASLP☆15Nov 26, 2024Updated last year
- A Massive Contextual Speech Recognition Benchmark.☆105Aug 6, 2025Updated 7 months ago
- OSUM & OSUM-EChat, open speech understanding model and empathetic spoken chatbot based on it, open-sourced by ASLP@NPU.☆483Nov 23, 2025Updated 3 months ago
- ☆39Sep 25, 2025Updated 5 months ago
- Official repository for the WenetSpeech-Chuan dataset.☆164Feb 5, 2026Updated last month
- A song aesthetic evaluation toolkit trained on SongEval.☆288Jun 15, 2025Updated 9 months ago
- LLaSE: Maximizing Acoustic Preservation for LLaMA based Speech Enhancement☆16Jul 11, 2025Updated 8 months ago
- This is a list of speech tasks and datasets, which can provide training data for Generative AI, AIGC, AI model training, intelligent spee…☆82Jun 7, 2024Updated last year
- ☆13Jun 8, 2024Updated last year
- Inference code for Audiodec-Valle-Wenetspeech4TTS☆50Jul 14, 2024Updated last year
- ☆16Jan 11, 2026Updated 2 months ago
- ☆31Sep 15, 2025Updated 6 months ago
- A Large-scale Cantonese Speech Corpus with Multi-dimensional Annotation☆280Feb 5, 2026Updated last month
- ☆22Jul 10, 2025Updated 8 months ago
- This is the official implementation of PGUSE☆35Jun 7, 2025Updated 9 months ago
- Blazing fast data loading with HuggingFace Dataset and Ray Data☆16Jan 12, 2024Updated 2 years ago
- TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages☆18May 23, 2024Updated last year
- An instruct text-to-speech solution based on LLaSA and CosyVoice2 developed by the ASLP lab and collaborators.☆227Feb 26, 2026Updated 3 weeks ago
- ☆46Jul 5, 2025Updated 8 months ago
- A Lightweight and Streaming Zero-Shot Voice Conversion via Mean Flows☆232Jan 8, 2026Updated 2 months ago
- Official code of SenSE.☆76Oct 30, 2025Updated 4 months ago
- This is the repository for the work "BridgeVoC: Revitalizing Neural Vocoder from a Restoration Perspective".☆64Nov 5, 2025Updated 4 months ago
- The official implementation of V-AURA: Temporally Aligned Audio for Video with Autoregression (ICASSP 2025) (Oral)☆33Feb 11, 2026Updated last month
- Random Tips and Writeups.☆15Feb 21, 2019Updated 7 years ago
- Di♪♪Rhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion☆2,261Nov 27, 2025Updated 3 months ago
- A Large-scale Wu Dialect Speech Corpus with Multi-dimensional Annotations☆117Feb 6, 2026Updated last month
- ☆43Feb 8, 2025Updated last year
- Linux内核学习——心中的内核☆18Jun 24, 2025Updated 8 months ago
- ☆16Sep 12, 2023Updated 2 years ago
- An interactive TUI for visualizing code statistics from tokei.☆33Jan 20, 2026Updated 2 months ago
- 🐧 Ucanto UCAN RPC in Go☆13Updated this week
- Dhruva is an open-source platform for serving language AI models at scale.☆21Aug 25, 2025Updated 6 months ago
- Ecommerce Store is a Java-based web application built using Spring Boot MVC and Thymeleaf for creating a fully functional online shopping…☆15Jan 24, 2025Updated last year
- C++ version of ailia models repository☆24Dec 31, 2025Updated 2 months ago
- LibAFLGo: Evaluating and Advancing Directed Greybox Fuzzing☆25Mar 4, 2026Updated 2 weeks ago
- A Diffusion Probabilistic Model for Target Sound Extraction☆40Sep 27, 2024Updated last year
- We propose C2SER, a novel audio-language model designed to enhance the stability and accuracy of speech emotion recognition through conte…☆44Mar 3, 2025Updated last year