jishengpeng / WavTokenizer
SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling
☆654Updated last week
Related projects: ⓘ
- PyTorch Implementation of AudioLCM (ACM-MM'24): a efficient and high-quality text-to-audio generation with latent consistency model.☆1,114Updated 2 months ago
- Official repo for WavCraft, an AI agent for audio creation and editing☆647Updated this week
- PyTorch Implementation of Make-An-Audio (ICML'23) with a Text-to-Audio Generative Model☆737Updated 3 months ago
- Code for paper "GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators"☆192Updated last month
- AcademiCodec: An Open Source Audio Codec Model for Academic Research☆564Updated 8 months ago
- Code for paper "Large Language Models are Efficient Learners of Noise-Robust Speech Recognition"☆120Updated 4 months ago
- A large chinese freelanguage chain tools,you can get free API from:open.bigmodel.cn☆106Updated 7 months ago
- Language-Codec: Reducing the Gaps Between Discrete Codec Representation and Speech Language Models☆207Updated 3 weeks ago
- The codes about "Uni-MoE: Scaling Unified Multimodal Models with Mixture of Experts"☆754Updated last week
- Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation☆1,086Updated 10 months ago
- [ECCV 2022] Official implementation of the paper: Audio-Visual Segmentation☆441Updated last week
- [ ICLR 2024 ] Official Codebase for "InstructCV: Instruction-Tuned Text-to-Image Diffusion Models as Vision Generalists"☆517Updated 4 months ago
- Code for paper "Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models"☆242Updated 3 months ago
- ☆106Updated 5 months ago
- An official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions☆1,220Updated last month
- Official PyTorch implementation of PDAE (NeurIPS 2022)☆268Updated 6 months ago
- ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec☆175Updated 2 weeks ago
- [CVPR 2024] Official code for "Text-Driven Image Editing via Learnable Regions"☆260Updated last month
- Unofficial Implementation of ReplaceAnything: https://aigcdesigngroup.github.io/replace-anything/☆526Updated 3 months ago
- PantoMatrix: Co-Speech Talking Head and Gestures Generation☆923Updated 2 months ago
- Multilingual Corpus of Web Fiction☆211Updated 2 months ago
- PyTorch Implementation of ProDiff (ACM-MM'22) with a Extremely-Fast diffusion speech synthesis pipeline☆429Updated last year
- ☆727Updated 3 weeks ago
- Real-time and accurate open-vocabulary end-to-end object detection☆1,482Updated last week
- GeoDream: Disentangling 2D and Geometric Priors for High-Fidelity and Consistent 3D Generation☆575Updated 7 months ago
- PyTorch Implementation of StyleSinger(AAAI 2024): Style Transfer for Out-of-Domain Singing Voice Synthesis☆171Updated 3 months ago
- PyTorch Implementation of GenerSpeech (NeurIPS'22): a text-to-speech model towards zero-shot style transfer of OOD custom voice.☆313Updated 7 months ago
- In this repository, you will learn how code works in VITS(Conditional Variational Autoencoder with Adversarial Learning for End-to-End Te…☆156Updated last year
- Code for Paper "UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation".☆865Updated last month
- TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models (2024 ICASSP)☆122Updated 3 weeks ago