ZHZisZZ / dllmLinks
dLLM: Simple Diffusion Language Modeling
☆529Updated this week
Alternatives and similar repositories for dllm
Users that are interested in dllm are comparing it to the libraries listed below
Sorting:
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…☆355Updated 11 months ago
- Simple & Scalable Pretraining for Neural Architecture Research☆299Updated 2 weeks ago
- Tina: Tiny Reasoning Models via LoRA☆304Updated last month
- An extension of the nanoGPT repository for training small MOE models.☆210Updated 8 months ago
- Training teachers with reinforcement learning able to make LLMs learn how to reason for test time scaling.☆348Updated 4 months ago
- ☆201Updated 11 months ago
- ☆301Updated last week
- Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation (NeurIPS 2025)☆504Updated last month
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)☆107Updated 8 months ago
- LongRoPE is a novel method that can extends the context window of pre-trained LLMs to an impressive 2048k tokens.☆267Updated 2 weeks ago
- ☆451Updated 2 months ago
- Chain of Experts (CoE) enables communication between experts within Mixture-of-Experts (MoE) models☆223Updated last week
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆173Updated 10 months ago
- Official PyTorch implementation for ICLR2025 paper "Scaling up Masked Diffusion Models on Text"☆334Updated 10 months ago
- minimal GRPO implementation from scratch☆99Updated 8 months ago
- rl from zero pretrain, can it be done? yes.☆280Updated last month
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024☆346Updated 6 months ago
- Exploring Applications of GRPO☆248Updated 2 months ago
- GPU-optimized framework for training diffusion language models at any scale. The backend of Quokka, Super Data Learners, and OpenMoE 2 tr…☆259Updated this week
- code for training & evaluating Contextual Document Embedding models☆200Updated 6 months ago
- Pretraining and inference code for a large-scale depth-recurrent language model☆843Updated last month
- ☆179Updated 3 months ago
- Official PyTorch implementation for Hogwild! Inference: Parallel LLM Generation with a Concurrent Attention Cache☆129Updated 3 months ago
- Dream 7B, a large diffusion language model☆1,054Updated last month
- 🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.☆564Updated 2 weeks ago
- Single File, Single GPU, From Scratch, Efficient, Full Parameter Tuning library for "RL for LLMs"☆556Updated last month
- Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.☆327Updated last year
- A framework to study AI models in Reasoning, Alignment, and use of Memory (RAM).☆297Updated this week
- Esoteric Language Models☆106Updated last month
- Open source interpretability artefacts for R1.☆163Updated 6 months ago