An end-to-end framework for multi-speaker transcription that jointly models who spoke, when, and what.
☆231Jun 4, 2026Updated last week
Alternatives and similar repositories for SoulX-Transcriber
Users that are interested in SoulX-Transcriber are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆19Aug 23, 2024Updated last year
- 🎬 AI Movie Script & Storyboard Generator – An AI-powered tool that creates movie scripts with GPT-4 and visual storyboards using DALL-E …☆14Oct 9, 2024Updated last year
- Guqin performance analysis☆12Aug 31, 2020Updated 5 years ago
- Training code for MaskGCT-T2S model.☆25Dec 14, 2024Updated last year
- ☆36Sep 6, 2025Updated 9 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- 小红书自动化发布 Skill — 基于 Playwright 浏览器自动化,支持扫码登录、AI 智能配图(Gemini/通义万相)、自动发布图文笔记。OpenClaw AgentSkill。☆59Mar 2, 2026Updated 3 months ago
- based on karpathy https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f☆109Apr 19, 2026Updated last month
- ☆19Nov 14, 2025Updated 6 months ago
- Working note for WSI analysis☆10Apr 3, 2023Updated 3 years ago
- Official PyTorch implementation of the TMI paper "Nucleus-aware Self-supervised Pretraining Using Unpaired Image-to-image Translation for…☆16Mar 13, 2024Updated 2 years ago
- 生成自动滚动的视频分镜头拆解表格☆16Jul 25, 2024Updated last year
- Clue-RAG: Towards Accurate and Cost-Efficient Graph-based RAG via Multi-Partite Graph and Query-Driven Iterative Retrieval☆26Mar 3, 2026Updated 3 months ago
- AI PPT赛道终结者,史上最最最强 PPT Skill!!! 使用GPT生成豪华的图片格式PPT,然后转换为完全可编辑的PPTX文件。☆728Updated this week
- WavReward: Spoken Dialogue Models With Generalist Reward Evaluators☆56May 15, 2025Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- [ACL 2026 Main] Training, inference, and testing of the SAC speech codec model.☆106Nov 1, 2025Updated 7 months ago
- ☆15May 7, 2024Updated 2 years ago
- An edge-native Multimodal Android Agent that integrates multimodal perception, memory and action☆222May 22, 2026Updated 3 weeks ago
- [NeurIPS 2025] Scaling Language-centric Omnimodal Representation Learning☆43Apr 13, 2026Updated last month
- A Benchmark and Evaluation Suite for Zero-shot Singing Voice Synthesis☆29Feb 11, 2026Updated 4 months ago
- 给互联网设计师的 AI 图片提示词灵感库,覆盖 APP、运营、海报、插画、品牌与电商等设计场景,每张图都配有可直接复制的 AI 提示词,让灵感随取随用,全部由 image2 完成☆163May 7, 2026Updated last month
- ☆23Sep 9, 2020Updated 5 years ago
- Typing to Listen at the Cocktail Party: Text-Guided Target Speaker Extraction (LLM-TSE)☆43Oct 13, 2023Updated 2 years ago
- A data framework for music information retrieval focusing on electronic music.☆24Mar 18, 2024Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- 堪舆子 · 传统风水顾问 Claude Code Skill —— 三元玄空飞星、八宅明镜、择日学☆58Apr 11, 2026Updated 2 months ago
- ☆25Jun 2, 2026Updated last week
- Yuan (元) — a unified destiny-reading skill for Codex, Claude Code, and Agent Skills runtimes. One input surface, six methods (BaZi / Chen…☆135Apr 27, 2026Updated last month
- High-performance, semantic turn detection for conversational AI☆41Oct 1, 2025Updated 8 months ago
- Official repo for "TiMo: Spatiotemporal Foundation Model for Satellite Image Time Series"☆28May 14, 2025Updated last year
- Pipeline-Parallel Lecture: Simplest Dualpipe Implementation.☆31Sep 17, 2025Updated 8 months ago
- Source code for "BLOOM-Net: Blockwise Optimization for Masking Networks Toward Scalable and Efficient Speech Enhancement"☆14Feb 13, 2022Updated 4 years ago
- ☆13Nov 16, 2020Updated 5 years ago
- 基于小红书私信通开发的AI获客工具(免费使用)☆26Jul 23, 2025Updated 10 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Official Repository of Paper: "Emilia-NV: A Non-Verbal Speech Dataset with Word-Level Annotation for Human-Like Speech Modeling"☆91Sep 18, 2025Updated 8 months ago
- Lightning-Fast, On-Device TTS — running natively via ONNX.☆73May 18, 2026Updated 3 weeks ago
- Official repo for [ICLR 2026] "AnesSuite: A Comprehensive Benchmark and Dataset Suite for Anesthesiology Reasoning in LLMs"☆25Feb 28, 2026Updated 3 months ago
- ☆33Oct 28, 2025Updated 7 months ago
- 🎙️ Automatically transcribe audio/video into high-quality, speaker-specific Text-To-Speech datasets ✨☆18May 20, 2025Updated last year
- Repository for Niko and Alexis to share stuff☆28Aug 9, 2021Updated 4 years ago
- [INTERSPEECH 2025] The official implementation of DiEmo-TTS: Disentangled Emotion Representations via Self-Supervised Distillation for…☆17Sep 7, 2025Updated 9 months ago