PodAgent: A Comprehensive Framework for Podcast Generation
☆122May 16, 2025Updated 9 months ago
Alternatives and similar repositories for PodAgent
Users that are interested in PodAgent are comparing it to the libraries listed below
Sorting:
- Pushing the Limits of Zero-shot End-to-End Speech Translation☆26Dec 12, 2024Updated last year
- [WWW'2024] "PromptMM: Multi-Modal Knowledge Distillation for Recommendation with Prompt-Tuning"☆52Mar 20, 2024Updated last year
- The code implementation for the paper "DreamLifting: A Plug-in Module Lifting MV Diffusion Models for 3D Asset Generation".☆28Sep 1, 2025Updated 6 months ago
- The official implementation of the DIFFA series for dLLM-based large audio language model☆59Feb 2, 2026Updated last month
- [WSDM'2025] "MixRec: Heterogeneous Graph Collaborative Filtering"☆19Dec 19, 2024Updated last year
- A Massive Contextual Speech Recognition Benchmark.☆99Aug 6, 2025Updated 6 months ago
- Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction☆218Feb 28, 2025Updated last year
- Code for ICLR'24 Paper "Decoupling Weighing and Selecting for Integrating Multiple Graph Pre-training Tasks"☆10Mar 12, 2024Updated last year
- [ICDE'23] "DGNN: Disentangled Graph Social Recommendation"☆24Mar 24, 2023Updated 2 years ago
- 5Hz Deep-Compression Speech VAE for AR-Diffusion and CALMs☆57Nov 19, 2025Updated 3 months ago
- X-E-Speech: Joint Training Framework of Non-Autoregressive Cross-lingual Emotional Text-to-Speech and Voice Conversion☆111Apr 1, 2024Updated last year
- [INTERSPEECH 2025 Oral]Official code for "Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment"☆64Jun 16, 2025Updated 8 months ago
- ☆68Jul 16, 2023Updated 2 years ago
- OSUM & OSUM-EChat, open speech understanding model and empathetic spoken chatbot based on it, open-sourced by ASLP@NPU.☆482Nov 23, 2025Updated 3 months ago
- [WWW'2024] "GraphPro: Graph Pre-training and Prompt Learning for Recommendation"☆73Jun 10, 2024Updated last year
- [CIKM'2024] "EasyST: A Simple Framework for Spatio-Temporal Prediction"☆20Sep 17, 2024Updated last year
- Temporary anonymous version☆22Mar 20, 2024Updated last year
- [Recsys'2023] "RCL: Multi-Relational Contrastive Learning for Recommendation"☆16Sep 6, 2023Updated 2 years ago
- [CIKM'2023] "CL4ST: Spatio-Temporal Meta Contrastive Learning"☆24Jun 17, 2024Updated last year
- [ACM TIST] "LLM4Urban: Urban Computing in the Era of Large Language Models"☆43Apr 4, 2025Updated 10 months ago
- [ACL 2025] OZSpeech: One-step Zero-shot Speech Synthesis with Learned-Prior-Conditioned Flow Matching☆45Feb 9, 2025Updated last year
- LoRA-based phoneme/prosody control for LLM-based TTS with no G2P - Lightweight adapter for edit and control the target language's phoneme…☆23Aug 14, 2025Updated 6 months ago
- The first spoken long-text dataset derived from live streams, designed to reflect the redundancy-rich and conversational nature of real-w…☆12Jun 28, 2025Updated 8 months ago
- This repository contains a series of works on diffusion-based speech tokenizers, including the official implementation of the paper: "TaD…☆76Jan 25, 2026Updated last month
- [AAAI 2024] CTX-txt2vec, the acoustic model in UniCATS☆64Nov 18, 2024Updated last year
- Descript Audio Codec - VAE Variant (.dac-vae): High-Fidelity Audio Compression with Variational Autoencoder☆31Aug 30, 2025Updated 6 months ago
- ☆346Apr 11, 2025Updated 10 months ago
- ☆18May 27, 2025Updated 9 months ago
- Official repo for CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations☆62Jan 16, 2025Updated last year
- [ICML 2025] SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation☆304Nov 5, 2025Updated 3 months ago
- ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations☆184Mar 6, 2024Updated last year
- ACM MM 2024 FlashSpeech: Efficient Zero-Shot Speech Synthesis☆154Sep 20, 2024Updated last year
- [CIKM'2023] "STExplainer: Explainable Spatio-Temporal Graph Neural Networks"☆60Jun 17, 2024Updated last year
- This repository implement a novel zero-shot TTS framework, named Flamed-TTS, focusing on the efficient generation and dynamic pacing in …☆57Aug 9, 2025Updated 6 months ago
- Generative Expressive Conversational Speech Synthesis (Accepted by MM'2024)☆78Nov 1, 2024Updated last year
- ✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM☆368May 27, 2025Updated 9 months ago
- Source code for the EMNLP 2025 paper “DM-Codec: Distilling Multimodal Representations for Speech Tokenization”☆56Jun 1, 2025Updated 9 months ago
- The source code for the paper XiaoiceSing2 (interspeech2023)☆49Jan 15, 2024Updated 2 years ago
- [ICASSP2025] Official code for VoiceDiT: Dual-Condition Diffusion Transformer for Environment-Aware Speech Synthesis☆52Apr 9, 2025Updated 10 months ago