M-E-AGI-Lab / Awesome-World-ModelsView external linksLinks
Official Repo of From Masks to Worlds: A Hitchhiker’s Guide to World Models.
☆73Oct 26, 2025Updated 3 months ago
Alternatives and similar repositories for Awesome-World-Models
Users that are interested in Awesome-World-Models are comparing it to the libraries listed below
Sorting:
- [ArXiv 26] FRoM-W1: Towards General Humanoid Whole-Body Control with Language Instructions☆122Updated this week
- [ICLR 2026] Official implementation for What matters for Representation Alignment: Global Information or Spatial Structure?☆217Dec 15, 2025Updated last month
- PyTorch Implementation of Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model☆27Oct 10, 2024Updated last year
- Controlnet module for Wan2.2☆40Oct 30, 2025Updated 3 months ago
- Easy and Efficient dLLM Fine-Tuning☆209Jan 21, 2026Updated 3 weeks ago
- ☆52Jul 16, 2025Updated 6 months ago
- [arxiv: 2512.19673] Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies☆59Feb 6, 2026Updated last week
- Extend the Conditioning of Stable Diffusion to take Audio Embeddings Instead of Text Embeddings using Wav2Vec2-BERT model☆13Sep 25, 2024Updated last year
- Open-Source Turn-Taking Detection Model and Dataset for Full-Duplex Spoken Dialogue Systems☆75Jan 25, 2026Updated 3 weeks ago
- The author's implementation of FUDOKI, a multimodal large language model purely based on discrete flow matching.☆68Dec 21, 2025Updated last month
- ☆65Dec 3, 2025Updated 2 months ago
- Unofficial implementation JEN-1 Composer: A Unified Framework for High-Fidelity Multi-Track Music Generation(https://arxiv.org/abs/2310.1…☆32Jan 19, 2024Updated 2 years ago
- ☆83Nov 10, 2025Updated 3 months ago
- arxiv daily for speech translation, legal. Ref: Vincentqyw/cv-arxiv-daily☆14Jan 6, 2025Updated last year
- A python algorithm to change the pitch of the voice in real time☆13Dec 13, 2020Updated 5 years ago
- ☆70Dec 5, 2025Updated 2 months ago
- TASU: A New Style of Alignment of Speech LLM with only Text Training Data, zero-shot on ASR and Other SU tasks☆21Jan 19, 2026Updated 3 weeks ago
- ☆42Jul 9, 2025Updated 7 months ago
- Using machine learning to improve simulations of a dynamical system☆10Apr 24, 2019Updated 6 years ago
- Learning an Interpretable End-to-End Network for Real-Time Acoustic Beamforming☆15Aug 20, 2024Updated last year
- ComfyUI workflows to create smooth transitions between video clips using Wan VACE. Works with video from any model or other source-LTX-2,…☆29Feb 6, 2026Updated last week
- [AAAI 2025] Neural-Symbolic Collaborative Distillation: Advancing Small Language Models for Complex Reasoning Tasks☆11Jun 19, 2025Updated 7 months ago
- Series of tutorials to show how to use gVXR☆10May 13, 2024Updated last year
- Official Repo for the Paper Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control☆39Dec 30, 2024Updated last year
- [ICLR 2025] Ready-to-React: Online Reaction Policy for Two-Character Interaction Generation☆45Mar 13, 2025Updated 11 months ago
- A curated list of awesome autoregressive papers in Generative AI☆142Sep 26, 2025Updated 4 months ago
- ☆11Nov 7, 2024Updated last year
- Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge☆21Jul 25, 2022Updated 3 years ago
- A universal adapter including zero-copy Python bindings for Philip Turner's metal flash attention library.☆23Dec 15, 2025Updated 2 months ago
- Extended Implementation of FastLGS☆16Dec 17, 2024Updated last year
- [ACM-MM 2025 Workshop] More Is Better: A MoE-Based Emotion Recognition Framework with Human Preference Alignment.☆25Nov 25, 2025Updated 2 months ago
- ☆34Oct 29, 2025Updated 3 months ago
- Retargeting of the InterAct dataset onto a common skeleton☆22Sep 16, 2025Updated 4 months ago
- Open, royalty free, lyrics2song / song generation data collection / cleaning pipeline.☆17May 9, 2025Updated 9 months ago
- Improving Symbolic Music Generation with Inference-Time Alignment☆20Aug 2, 2025Updated 6 months ago
- An agentic runtime that enables secure, extensible and configurable AI automation from any model☆17Jan 19, 2026Updated 3 weeks ago
- semantic tokenizer for speech and music☆21Jul 6, 2025Updated 7 months ago
- ☆10May 24, 2021Updated 4 years ago
- [NeurIPS 2025] Encoder-Decoder Diffusion Language Models for Efficient Training and Inference☆36Oct 29, 2025Updated 3 months ago