We propose C2SER, a novel audio-language model designed to enhance the stability and accuracy of speech emotion recognition through contextual perception and chain of Thought (CoT).
☆17Mar 3, 2025Updated last year
Alternatives and similar repositories for C2SER
Users that are interested in C2SER are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement☆46Mar 10, 2025Updated last year
- Llasa Speed Up☆62Jan 18, 2026Updated 3 months ago
- wenet_LLM_from_ASLP☆15Nov 26, 2024Updated last year
- A Massive Contextual Speech Recognition Benchmark.☆105Aug 6, 2025Updated 8 months ago
- OSUM & OSUM-EChat, open speech understanding model and empathetic spoken chatbot based on it, open-sourced by ASLP@NPU.☆488Nov 23, 2025Updated 5 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆39Sep 25, 2025Updated 7 months ago
- Official repository for the WenetSpeech-Chuan dataset.☆176Feb 5, 2026Updated 2 months ago
- A song aesthetic evaluation toolkit trained on SongEval.☆301Apr 8, 2026Updated 3 weeks ago
- LLaSE: Maximizing Acoustic Preservation for LLaMA based Speech Enhancement☆16Jul 11, 2025Updated 9 months ago
- ☆16Mar 12, 2024Updated 2 years ago
- This is a list of speech tasks and datasets, which can provide training data for Generative AI, AIGC, AI model training, intelligent spee…☆82Jun 7, 2024Updated last year
- ☆13Jun 8, 2024Updated last year
- Inference code for Audiodec-Valle-Wenetspeech4TTS☆51Jul 14, 2024Updated last year
- A Large-scale Cantonese Speech Corpus with Multi-dimensional Annotation☆298Feb 5, 2026Updated 2 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ☆34Sep 15, 2025Updated 7 months ago
- ☆22Jul 10, 2025Updated 9 months ago
- This is the official implementation of PGUSE☆40Jun 7, 2025Updated 10 months ago
- Blazing fast data loading with HuggingFace Dataset and Ray Data☆16Jan 12, 2024Updated 2 years ago
- TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages☆19May 23, 2024Updated last year
- An instruct text-to-speech solution based on LLaSA and CosyVoice2 developed by the ASLP lab and collaborators.☆240Feb 26, 2026Updated 2 months ago
- A Lightweight and Streaming Zero-Shot Voice Conversion via Mean Flows☆266Jan 8, 2026Updated 3 months ago
- This is the repository for the work "BridgeVoC: Revitalizing Neural Vocoder from a Restoration Perspective".☆64Nov 5, 2025Updated 5 months ago
- ☆54Jul 5, 2025Updated 9 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Official code of SenSE.☆83Oct 30, 2025Updated 5 months ago
- The official implementation of V-AURA: Temporally Aligned Audio for Video with Autoregression (ICASSP 2025) (Oral)☆33Feb 11, 2026Updated 2 months ago
- Random Tips and Writeups.☆15Feb 21, 2019Updated 7 years ago
- Di♪♪Rhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion☆2,289Nov 27, 2025Updated 5 months ago
- ☆43Feb 8, 2025Updated last year
- Linux内核学习——心中的内核☆18Jun 24, 2025Updated 10 months ago
- ☆16Sep 12, 2023Updated 2 years ago
- An interactive TUI for visualizing code statistics from tokei.☆35Jan 20, 2026Updated 3 months ago
- A Large-scale Wu Dialect Speech Corpus with Multi-dimensional Annotations☆143Feb 6, 2026Updated 2 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- 🐧 Ucanto UCAN RPC in Go☆13Mar 18, 2026Updated last month
- Dhruva is an open-source platform for serving language AI models at scale.☆22Aug 25, 2025Updated 8 months ago
- A Diffusion Probabilistic Model for Target Sound Extraction☆40Sep 27, 2024Updated last year
- C++ version of ailia models repository☆25Dec 31, 2025Updated 4 months ago
- LibAFLGo: Evaluating and Advancing Directed Greybox Fuzzing☆25Mar 4, 2026Updated last month
- We propose C2SER, a novel audio-language model designed to enhance the stability and accuracy of speech emotion recognition through conte…☆47Mar 3, 2025Updated last year
- [ICLR 2024] This is the official implementation for the paper: "Beyond imitation: Leveraging fine-grained quality signals for alignment"☆10May 5, 2024Updated last year