yujxx / PodAgent
PodAgent: A Comprehensive Framework for Podcast Generation
☆63Updated 2 weeks ago
Alternatives and similar repositories for PodAgent:
Users that are interested in PodAgent are comparing it to the libraries listed below
- ☆64Updated 6 months ago
- SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems☆81Updated last year
- An LLM-based agent simulation framework that simulates human behavior and generates dynamic, text-based social graphs.☆67Updated this week
- A curated list of Video to Audio Generation☆35Updated 5 months ago
- FlexRAG: A RAG Framework for Information Retrieval and Generation.☆136Updated last week
- Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction☆166Updated last month
- ☆210Updated 2 weeks ago
- "AI-Creator: Fully-Automated Video Editing with LLM Agents"☆49Updated this week
- CLaMP 3: Universal Music Information Retrieval Across Unaligned Modalities and Unseen Languages☆129Updated last month
- ☆142Updated last month
- An easy-to-use, fast, and easily integrable tool for evaluating audio LLM☆70Updated last week
- Implementation for the paper "ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems".☆149Updated 3 weeks ago
- 💡 VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning☆37Updated this week
- Official implementation for "ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization"☆59Updated last month
- 🤗 R1-AQA Model: mispeech/r1-aqa☆209Updated this week
- Dolphin is a multilingual, multitask ASR model jointly trained by DataoceanAI and Tsinghua University.☆141Updated this week
- Kyutai with an "eye"☆160Updated last week
- OpenOmni: Official implementation of Advancing Open-Source Omnimodal Large Language Models with Progressive Multimodal Alignment and Rea…☆40Updated 2 weeks ago
- Extension of ChatTTS, 3x Faster on Windows, Support Voice Cloning and Mobile Deployment☆144Updated last month
- Official code of the paper: Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis.☆45Updated 6 months ago
- LUCY: Linguistic Understanding and Control Yielding Early Stage of Her☆32Updated 2 weeks ago
- flow mirror models from JZX AI Labs☆43Updated 6 months ago
- ✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM☆11Updated 2 months ago
- Bambo is a new proxy framework. Compared with mainstream frameworks, it is more lightweight and flexible and can handle various load task…☆35Updated last month
- A project for tri-modal LLM benchmarking and instruction tuning.☆28Updated last week
- GPT-4o-level, real-time spoken dialogue system.☆302Updated 2 months ago
- RealSI: Open Benchmark for Simultaneous Interpretation in Real-world Scenarios☆53Updated 4 months ago
- ☆23Updated 3 months ago
- ☆54Updated 8 months ago
- a text-conditional diffusion probabilistic model capable of generating high fidelity audio.☆161Updated 10 months ago