bloc97 / DeMo
DeMo: Decoupled Momentum Optimization
☆147Updated 2 weeks ago
Alternatives and similar repositories for DeMo:
Users that are interested in DeMo are comparing it to the libraries listed below
- supporting pytorch FSDP for optimizers☆68Updated last week
- Normalized Transformer (nGPT)☆136Updated 3 weeks ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆85Updated 3 weeks ago
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training☆113Updated 8 months ago
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"☆197Updated 2 months ago
- Muon optimizer for neural networks: >30% extra sample efficiency, <3% wallclock overhead☆174Updated this week
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆108Updated 2 weeks ago
- WIP☆89Updated 4 months ago
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆115Updated 3 months ago
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆154Updated 2 months ago
- ☆53Updated 10 months ago
- ☆74Updated 5 months ago
- look how they massacred my boy☆62Updated 2 months ago
- PyTorch implementation of models from the Zamba2 series.☆164Updated 3 weeks ago
- Collection of autoregressive model implementation☆67Updated 3 weeks ago
- Efficient optimizers☆126Updated this week
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆89Updated last week
- ☆64Updated 3 months ago
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆181Updated 5 months ago
- A MAD laboratory to improve AI architecture designs 🧪☆95Updated 7 months ago
- ☆39Updated 10 months ago
- ☆27Updated 5 months ago
- smolLM with Entropix sampler on pytorch☆147Updated last month
- ☆49Updated 9 months ago
- Experiment of using Tangent to autodiff triton☆72Updated 10 months ago
- ☆138Updated 2 weeks ago
- Triton Implementation of HyperAttention Algorithm☆46Updated last year
- ☆48Updated 2 months ago
- An introduction to LLM Sampling☆66Updated last month
- ☆119Updated 3 months ago