bloc97/DeMo

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/bloc97/DeMo)

bloc97 / DeMo

DeMo: Decoupled Momentum Optimization

☆202

Alternatives and similar repositories for DeMo

Users that are interested in DeMo are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

NousResearch / DisTrO
View on GitHub
Distributed Training Over-The-Internet
☆1,047Oct 14, 2025Updated 9 months ago
matttreed / diloco-sim
View on GitHub
☆23Jan 5, 2025Updated last year
bentherien / mu_learned_optimization
View on GitHub
[Poster; ICLR 2026] [Oral; Neurips OPT2024] μLO: Compute-Efficient Meta-Generalization of Learned Optimizers
☆16Apr 15, 2026Updated 3 months ago
kyleliang919 / Super_Muon
View on GitHub
☆68Mar 21, 2025Updated last year
main-horse / hnet-old
View on GitHub
H-Net Dynamic Hierarchical Architecture
☆81Sep 11, 2025Updated 10 months ago
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
QuixiAI / grokadamw
View on GitHub
☆137Aug 19, 2024Updated last year
doomslide / autoloom
View on GitHub
Approximating the joint distribution of language models via MCTS
☆22Nov 3, 2024Updated last year
tiremoscode / dw-grupo58
View on GitHub
☆20Nov 28, 2024Updated last year
buyi-Yang / getQzonehistory
View on GitHub
☆12Nov 13, 2024Updated last year
abduvalimurodullayev1 / boilerplate_Drf
View on GitHub
This is the boilerplate for django project. There are so many settings configurations
☆10Nov 7, 2025Updated 8 months ago
GovardhaneNitin / smart-inventory
View on GitHub
A smart inventory management system that includes real-time stock tracking, supplier management, predictive analytics for inventory forec…
☆16Apr 22, 2025Updated last year
epfml / schedules-and-scaling
View on GitHub
Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"
☆93Oct 30, 2024Updated last year
evanatyourservice / llm-jax
View on GitHub
Train a SmolLM-style llm on fineweb-edu in JAX/Flax with an assortment of optimizers.
☆19Jul 24, 2025Updated 11 months ago
Francesco215 / text-diffusion
View on GitHub
Generates text with diffusion models. Reproduction of the Continous Diffusion for Categorical Data paper by Deepmind
☆18Dec 9, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
SinatrasC / entropix
View on GitHub
Entropy Based Sampling and Parallel CoT Decoding
☆17Oct 9, 2024Updated last year
kaiokendev / cutoff-len-is-context-len
View on GitHub
Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limit
☆62Jun 21, 2023Updated 3 years ago
vaguenebula / AlpacaDataReflect
View on GitHub
An experiment to see if chatgpt can improve the output of the stanford alpaca dataset
☆12Mar 29, 2023Updated 3 years ago
shawntan / stickbreaking-attention
View on GitHub
Stick-breaking attention
☆63Jul 1, 2025Updated last year
SinatrasC / entropix-smollm
View on GitHub
smolLM with Entropix sampler on pytorch
☆148Oct 31, 2024Updated last year
joey00072 / Multi-Head-Latent-Attention-MLA-
View on GitHub
working implimention of deepseek MLA
☆44Jan 8, 2025Updated last year
Aleph-Alpha-Research / scaling
View on GitHub
Scaling is a distributed training library and installable dependency designed to scale up neural networks, with a dedicated module for tr…
☆66Nov 18, 2025Updated 8 months ago
fal-ai-community / nano-mdm
View on GitHub
Tiny re-implementation of MDM in style of LLaDA and nano-gpt speedrun
☆57Mar 10, 2025Updated last year
PrimeIntellect-ai / prime-iroh
View on GitHub
Asynchronous P2P communication backend for decentralized pipeline parallelism
☆45Jun 12, 2026Updated last month
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
NoahAmsel / PolarExpress
View on GitHub
☆32Jul 6, 2026Updated 2 weeks ago
edwardmilsom / function-space-learning-rates-paper
View on GitHub
Code for the paper "Function-Space Learning Rates"
☆23Jun 3, 2025Updated last year
nil0x9 / flash-muon
View on GitHub
Flash-Muon: An Efficient Implementation of Muon Optimizer
☆257Jun 15, 2025Updated last year
google-deepmind / asyncdiloco
View on GitHub
☆51Jan 18, 2024Updated 2 years ago
Noumena-Network / nmoe
View on GitHub
MoE training for Me and You and maybe other people
☆394Mar 15, 2026Updated 4 months ago
xjdr-alt / simple_transformer
View on GitHub
Simple Transformer in Jax
☆143Jun 22, 2024Updated 2 years ago
edouardoyallon / acco
View on GitHub
ACCO: An optimization algorithm for sharded distributed LLM training.
☆13May 22, 2025Updated last year
PrimeIntellect-ai / OpenDiloco
View on GitHub
OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training
☆582Updated this week
laramohan / wikillm
View on GitHub
LLMs as Collaboratively Edited Knowledge Bases
☆52Feb 8, 2026Updated 5 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
zaydzuhri / flame
View on GitHub
Fork of Flame repo for training of some new stuff in development
☆20Updated this week
JoeLi12345 / nGPT
View on GitHub
an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)
☆112Mar 7, 2025Updated last year
mgmalek / efficient_cross_entropy
View on GitHub
☆124May 28, 2024Updated 2 years ago
fal-ai-community / NativeSparseAttention
View on GitHub
research impl of Native Sparse Attention (2502.11089)
☆62Feb 19, 2025Updated last year
zhixuan-lin / forgetting-transformer
View on GitHub
[ICLR 2025 & COLM 2025] Official PyTorch implementation of the Forgetting Transformer and Adaptive Computation Pruning
☆150Feb 25, 2026Updated 4 months ago
Noumena-Network / NSA-Test
View on GitHub
NSA Triton Kernels written with GPT5 and Opus 4.1
☆70Aug 12, 2025Updated 11 months ago
xjdr-alt / llmri
View on GitHub
look how they massacred my boy
☆63Oct 16, 2024Updated last year