We propose C2SER, a novel audio-language model designed to enhance the stability and accuracy of speech emotion recognition through contextual perception and chain of Thought (CoT).
☆17Mar 3, 2025Updated last year
Alternatives and similar repositories for C2SER
Users that are interested in C2SER are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement☆47Mar 10, 2025Updated last year
- Llasa Speed Up☆63Jan 18, 2026Updated 4 months ago
- wenet_LLM_from_ASLP☆15Nov 26, 2024Updated last year
- A Massive Contextual Speech Recognition Benchmark.☆107Aug 6, 2025Updated 10 months ago
- OSUM & OSUM-EChat, open speech understanding model and empathetic spoken chatbot based on it, open-sourced by ASLP@NPU.☆495Nov 23, 2025Updated 6 months ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- ☆40Sep 25, 2025Updated 8 months ago
- Official repository for the WenetSpeech-Chuan dataset.☆195Feb 5, 2026Updated 4 months ago
- A song aesthetic evaluation toolkit trained on SongEval.☆307Apr 8, 2026Updated 2 months ago
- LLaSE: Maximizing Acoustic Preservation for LLaMA based Speech Enhancement☆16Jul 11, 2025Updated 10 months ago
- ☆16Mar 12, 2024Updated 2 years ago
- This is a list of speech tasks and datasets, which can provide training data for Generative AI, AIGC, AI model training, intelligent spee…☆82Jun 7, 2024Updated 2 years ago
- ☆13Jun 8, 2024Updated 2 years ago
- Inference code for Audiodec-Valle-Wenetspeech4TTS☆51Jul 14, 2024Updated last year
- ☆34Sep 15, 2025Updated 8 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- A Large-scale Cantonese Speech Corpus with Multi-dimensional Annotation☆322Updated this week
- ☆24Jul 10, 2025Updated 10 months ago
- This is the official implementation of PGUSE☆40Jun 7, 2025Updated last year
- Blazing fast data loading with HuggingFace Dataset and Ray Data☆15Jan 12, 2024Updated 2 years ago
- TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages☆19May 23, 2024Updated 2 years ago
- An instruct text-to-speech solution based on LLaSA and CosyVoice2 developed by the ASLP lab and collaborators.☆250Feb 26, 2026Updated 3 months ago
- A Lightweight and Streaming Zero-Shot Voice Conversion via Mean Flows☆280Jan 8, 2026Updated 5 months ago
- This is the repository for the work "BridgeVoC: Revitalizing Neural Vocoder from a Restoration Perspective".☆65Nov 5, 2025Updated 7 months ago
- ☆63Jul 5, 2025Updated 11 months ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Official code of SenSE.☆87Oct 30, 2025Updated 7 months ago
- The official implementation of V-AURA: Temporally Aligned Audio for Video with Autoregression (ICASSP 2025) (Oral)☆34Feb 11, 2026Updated 3 months ago
- Random Tips and Writeups.☆15Feb 21, 2019Updated 7 years ago
- Di♪♪Rhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion☆2,301Nov 27, 2025Updated 6 months ago
- ☆43Feb 8, 2025Updated last year
- Linux内核学习——心中的内核☆18Jun 24, 2025Updated 11 months ago
- ☆16Sep 12, 2023Updated 2 years ago
- An interactive TUI for visualizing code statistics from tokei.☆37May 24, 2026Updated 2 weeks ago
- A Large-scale Wu Dialect Speech Corpus with Multi-dimensional Annotations☆149Feb 6, 2026Updated 4 months ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- 🐧 Ucanto UCAN RPC in Go☆13Mar 18, 2026Updated 2 months ago
- Dhruva is an open-source platform for serving language AI models at scale.☆23Aug 25, 2025Updated 9 months ago
- A Diffusion Probabilistic Model for Target Sound Extraction☆40Sep 27, 2024Updated last year
- C++ version of ailia models repository☆26May 14, 2026Updated 3 weeks ago
- LibAFLGo: Evaluating and Advancing Directed Greybox Fuzzing☆26Mar 4, 2026Updated 3 months ago
- We propose C2SER, a novel audio-language model designed to enhance the stability and accuracy of speech emotion recognition through conte…☆48Mar 3, 2025Updated last year
- [ICLR 2024] This is the official implementation for the paper: "Beyond imitation: Leveraging fine-grained quality signals for alignment"☆10May 5, 2024Updated 2 years ago