Tencent-Hunyuan / HY-MTView external linksLinks
☆474Jan 1, 2026Updated last month
Alternatives and similar repositories for HY-MT
Users that are interested in HY-MT are comparing it to the libraries listed below
Sorting:
- Animate Any Character in Any World☆88Jan 9, 2026Updated last month
- A Unified Visual Generator with Interleaved OmniModal Context☆180Updated this week
- Official repo of "MorphAny3D: Unleashing the Power of Structured Latent in 3D Morphing“☆77Jan 5, 2026Updated last month
- Encode/Decode magic data pockets inside images☆15Sep 18, 2023Updated 2 years ago
- A Universal Framework for AI Video Watermark Removal☆49Dec 5, 2025Updated 2 months ago
- Java SDK for Z.ai Open Platform☆43Feb 2, 2026Updated last week
- ☆77Sep 25, 2025Updated 4 months ago
- ☆19Mar 27, 2024Updated last year
- ☆66Jan 12, 2026Updated last month
- Open-source reproducible benchmarks from Argmax☆77Jan 19, 2026Updated 3 weeks ago
- ☆28Jun 4, 2025Updated 8 months ago
- The official repo for "OpenMoE 2: Sparse Diffusion Language Models".☆52Dec 28, 2025Updated last month
- A high-quality rapid TTS voice cloning model that reaches speeds of 150x realtime.☆694Jan 28, 2026Updated 2 weeks ago
- ☆131Dec 24, 2025Updated last month
- ☆86Feb 4, 2026Updated last week
- GLM-TTS: Controllable & Emotion-Expressive Zero-shot TTS with Multi-Reward Reinforcement Learning☆923Dec 17, 2025Updated last month
- Face_lib separate from AI_Power☆28Nov 10, 2025Updated 3 months ago
- Official PyTorch Implementation of "SVG-T2I: Scaling up Text-to-Image Latent Diffusion Model Without Variational Autoencoder".☆132Dec 18, 2025Updated last month
- Fun-Audio-Chat is a Large Audio Language Model built for natural, low-latency voice interactions.☆835Jan 29, 2026Updated 2 weeks ago
- Real time faster whisper gradio☆25Aug 17, 2025Updated 5 months ago
- Speaker prediction for captions on the Lex Fridman podcast☆27Feb 14, 2024Updated 2 years ago
- ☆27Dec 13, 2024Updated last year
- ☆53Aug 5, 2025Updated 6 months ago
- simple and fast wav2lip using onnx models for face-detection and inference. Easy installation☆28Oct 14, 2024Updated last year
- High-speed batch audio enhancer that restores high-frequency details like Sony DSEE HX, converting any audio file to Hi-Res.☆44Sep 7, 2025Updated 5 months ago
- ☆76Dec 8, 2025Updated 2 months ago
- Causal streaming adaptation of OpenAI Whisper for real-time transcription on small audio chunks.☆62Sep 18, 2025Updated 4 months ago
- HunyuanImage-2.1: An Efficient Diffusion Model for High-Resolution (2K) Text-to-Image Generation☆672Oct 14, 2025Updated 4 months ago
- project page for ChatAnyone☆116Mar 28, 2025Updated 10 months ago
- Qwen3-ASR is an open-source series of ASR models developed by the Qwen team at Alibaba Cloud, supporting stable multilingual speech/music…☆1,489Jan 30, 2026Updated 2 weeks ago
- DiTTo-TTS: Diffusion Transformers for Scalable Text-to-Speech without Domain-Specific Factors☆35Feb 11, 2025Updated last year
- Forked from https://gitlab.com/MatejB/PrePoMax☆12Jan 8, 2024Updated 2 years ago
- MiniMax-Provider-Verifier offers a rigorous, vendor-agnostic way to verify whether third-party deployments of the Minimax M2 model are co…☆23Jan 15, 2026Updated last month
- Code for the blog "Neural audio codecs: how to get audio into LLMs"☆151Oct 20, 2025Updated 3 months ago
- [AutoArk] GPA (General Purpose Audio) can do ASR, TTS and voice conversion with one tiny 300M model!☆86Jan 29, 2026Updated 2 weeks ago
- This is a repository for "character images search" from image and tags.☆37Sep 17, 2025Updated 4 months ago
- [NeurIPS'25 Spotlight] Official implementation of "JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation"☆70Jan 10, 2026Updated last month
- A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.☆1,430Sep 22, 2025Updated 4 months ago
- Fine-tune the Whisper speech recognition model to support training without timestamp data, training with timestamp data, and training wit…☆312Dec 22, 2025Updated last month