wonjune-kang/llm-speech-summarization

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/wonjune-kang/llm-speech-summarization)

wonjune-kang / llm-speech-summarization

Prompting Large Language Models with Audio for General-Purpose Speech Summarization

☆20

Alternatives and similar repositories for llm-speech-summarization

Users that are interested in llm-speech-summarization are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

chenpk00 / IS2024_stream_decoder_only_asr
View on GitHub
☆16Mar 12, 2024Updated 2 years ago
colaudiolab / AudioSet-R
View on GitHub
Official implementation: "AudioSet-R: A Refined AudioSet with Multi-Stage LLM Label Reannotation"
☆19Oct 9, 2025Updated 9 months ago
xinyebei / 2026_finvcup_baseline
View on GitHub
信也杯2026比赛baseline
☆15Jun 17, 2026Updated last month
Sreyan88 / RECAP
View on GitHub
Code for ICASSP 2024 Paper: RECAP: Retrieval-Augmented Audio Captioning
☆16Jun 23, 2024Updated 2 years ago
ozspeech / OZSpeech
View on GitHub
[ACL 2025] OZSpeech: One-step Zero-shot Speech Synthesis with Learned-Prior-Conditioned Flow Matching
☆45Feb 9, 2025Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
alanshaoTT / LAT-Audio-Repo
View on GitHub
☆27Apr 28, 2026Updated 2 months ago
Jazzcharles / AuroLA
View on GitHub
☆28Feb 23, 2026Updated 5 months ago
lifeiteng / NotebookTTS
View on GitHub
Text-To-Speech for NotebookLM
☆39Jul 20, 2025Updated last year
jyhan03 / dpccn
View on GitHub
This repository provides an implementation of the DPCCN model for single-channel speech separation. More details will be updated soon.
☆13Dec 8, 2021Updated 4 years ago
pengzhendong / wavesurfer
View on GitHub
For audio visualization and playback in Jupyter notebooks.
☆18Nov 25, 2025Updated 8 months ago
walker-hyf / FCTalker
View on GitHub
FCTalker: Fine and Coarse Grained Context Modeling for Expressive Conversational Speech Synthesis (Accepted by ISCSLP'2024)
☆26Feb 22, 2024Updated 2 years ago
jjallaire / visualization-curriculum
View on GitHub
A data visualization curriculum of interactive notebooks.
☆14Nov 21, 2021Updated 4 years ago
hs-oh-prml / EmotionControllableTextToSpeech
View on GitHub
☆21Jun 16, 2021Updated 5 years ago
asif-hanif / palm
View on GitHub
[EMNLP 2024] Official code repository of paper titled "PALM: Few-Shot Prompt Learning for Audio Language Models" accepted in EMNLP 2024 c…
☆29Dec 22, 2024Updated last year
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
Mddct / usm-tokenizer
View on GitHub
semantic tokenizer for speech and music
☆20Jul 6, 2025Updated last year
MorenoLaQuatra / audiocaps-download
View on GitHub
This package aims at simplifying the download of the AudioCaps dataset.
☆35Dec 1, 2023Updated 2 years ago
p0p4k / Matcha-TTS-2
View on GitHub
E2E TTS using Conditional Flow Matching (Experimental*)
☆71Nov 10, 2023Updated 2 years ago
xiaomi-research / dasheng-tokenizer
View on GitHub
State-of-the-art continious audio tokenization
☆40Mar 9, 2026Updated 4 months ago
AmphionTeam / SpeechJudge
View on GitHub
SpeechJudge: Towards Human-Level Judgment for Speech Naturalness (https://arxiv.org/abs/2511.07931)
☆78Dec 23, 2025Updated 7 months ago
pengzhendong / streaming-tts-webui
View on GitHub
Streaming Text to Speech Web UI
☆22May 6, 2024Updated 2 years ago
aizhiqi-work / OpenKWS
View on GitHub
开源自定义唤醒词
☆17Dec 24, 2025Updated 7 months ago
mkunes / w2v2_audioFrameClassification
View on GitHub
wav2vec2 audio classification for prosodic boundary detection and other tasks
☆42Aug 11, 2023Updated 2 years ago
nhs-r-community / r4ds-ed2-exercise-solutions
View on GitHub
Exercise solutions to R for Data Science - second edition as part of the NHS-R Community book club
☆15Sep 9, 2023Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
js05212 / CollaborativeDeepLearning-TensorFlow
View on GitHub
Officially unofficial TensorFlow code for 'Collaborative Deep Learning for Recommender Systems' - SIGKDD
☆16Dec 14, 2019Updated 6 years ago
AI-S2-Lab / FluentEditor
View on GitHub
[InterSpeech'2024] FluentEditor:Text-based Speech Editing by Considering Acoustic and Prosody Consistency
☆62Oct 23, 2024Updated last year
vinceasvp / meta-sc
View on GitHub
☆11May 30, 2023Updated 3 years ago
walker-hyf / NCSSD
View on GitHub
Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)
☆61Nov 1, 2024Updated last year
PigeonDan1 / ps-slm
View on GitHub
TASU: A New Style of Alignment of Speech LLM with only Text Training Data, zero-shot on ASR and Other SU tasks
☆27Updated this week
HuangZikang-TJU / Aug4TSE
View on GitHub
☆15Sep 16, 2024Updated last year
rhss10 / joint-apa-mdd-mtl
View on GitHub
Code for the Interspeech 2023 paper "A Joint Model for Pronunciation Assessment and Mispronunciation Detection and Diagnosis with Multi-t…
☆25Nov 9, 2023Updated 2 years ago
KdaiP / DC-Speech-VAE
View on GitHub
5Hz Deep-Compression Speech VAE for AR-Diffusion and CALMs
☆57Nov 19, 2025Updated 8 months ago
xiaoxing2001 / DeGLA
View on GitHub
[ACM MM25] Official Pytorch implementation of [Decoupled Global-Local Alignment for Improving Compositional Understanding]
☆16Jul 15, 2025Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
Nir-Bhay / markups
View on GitHub
A sleek, real-time Markdown editor with advanced preview, syntax highlighting, and extensible plugins.
☆19Updated this week
the-bird-F / GLM-Voice-RAG
View on GitHub
[EMNLP 2025 Findings] A complete cross-modal RAG system for end-to-end speech-to-speech large models, including ASR-based Retrieval and E…
☆31Jul 11, 2025Updated last year
Ego4DSounds / Ego4DSounds
View on GitHub
Ego4DSounds: A diverse egocentric dataset with high action-audio correspondence
☆21Jun 14, 2024Updated 2 years ago
umbertocappellazzo / Llama-AVSR
View on GitHub
Official Pytorch implementation of "Large Language Models are Strong Audio-Visual Speech Recognition Learners" [ICASSP 2025] and "Mitigat…
☆64Jan 18, 2026Updated 6 months ago
google-deepmind / librispeech-long
View on GitHub
LibriSpeech-Long is a benchmark dataset for long-form speech generation and processing. Released as part of "Long-Form Speech Generation …
☆99Dec 28, 2024Updated last year
walker-hyf / ECSS
View on GitHub
Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling (Accepted by AAAI'2024)
☆59Jun 20, 2024Updated 2 years ago
ex3ndr / supervoice-gpt-facodec
View on GitHub
GPT for FACodec
☆13Mar 25, 2024Updated 2 years ago