wonjune-kang / llm-speech-summarizationLinks
Prompting Large Language Models with Audio for General-Purpose Speech Summarization
☆18Updated 2 months ago
Alternatives and similar repositories for llm-speech-summarization
Users that are interested in llm-speech-summarization are comparing it to the libraries listed below
Sorting:
- We propose C2SER, a novel audio-language model designed to enhance the stability and accuracy of speech emotion recognition through conte…☆36Updated 4 months ago
- EMO-SUPERB submission☆44Updated 10 months ago
- Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)☆59Updated 8 months ago
- Collection of works for evaluating (and analyzing) large audio-language models (LALMs)☆32Updated this week
- A repo containing download guidance and corresponding scripts of the VoxBlink dataset.☆28Updated last year
- [ICLR 2025] Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic Soundscapes☆45Updated last month
- SSL Layerwise analysis for speech deepfake detection☆23Updated 5 months ago
- ☆33Updated last year
- A large-scale speech corpus introduced in Spark-TTS, built from diverse open-source datasets for training text-to-speech (TTS) systems.☆78Updated 2 months ago
- wav2vec2 audio classification for prosodic boundary detection and other tasks☆43Updated last year
- Source for the Interspeech 2024 Paper "Scaling up masked audio encoder learning for general audio classification"☆68Updated 2 months ago
- Official PyTorch inference code for the Interspeech 2025 paper: Efficient Speech Enhancement via Embeddings from Pre-trained Generative A…☆46Updated last month
- Survey on speech generation work.☆20Updated last year
- Unofficial PyTorch implementation of "Autoregressive Speech Synthesis without Vector Quantization (MELLE)"☆34Updated 3 weeks ago
- LibriSpeech-Long is a benchmark dataset for long-form speech generation and processing. Released as part of "Long-Form Speech Generation …☆78Updated 6 months ago
- The open source code for LLM-Codec☆136Updated 11 months ago
- VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling☆84Updated 8 months ago
- The project is associated with the recently-launched INTERSPEECH 2025 Workshop on Multilingual Conversational Speech Language Model (MLC-…☆38Updated 2 months ago
- Official Repository for "SingFake: Singing Voice Deepfake Detection"☆56Updated last year
- ☆71Updated last year
- SLT 2024 Mandarin Stuttering Event Detection and Automatic Speech Recognition Challenge☆12Updated last year
- Source code for DM-Codec.☆45Updated last month
- The repoduction codes for Qwen-Audio Fine-tuning☆42Updated 11 months ago
- Multi-Task Speech classification of accent and gender of an english speaker on Mozilla's common voice dataset☆27Updated last month
- [InterSpeech'2024] FluentEditor:Text-based Speech Editing by Considering Acoustic and Prosody Consistency☆55Updated 8 months ago
- ☆61Updated 8 months ago
- Typing to Listen at the Cocktail Party: Text-Guided Target Speaker Extraction (LLM-TSE)☆41Updated last year
- A low-bitrate single-codebook 16 kHz speech codec based on focal modulation☆93Updated 5 months ago
- The implementation for "Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions"☆42Updated 3 months ago
- ☆31Updated 3 months ago