wbrickner / noise_step
noise_step: Training in 1.58b With No Gradient Memory
☆215Updated last month
Alternatives and similar repositories for noise_step:
Users that are interested in noise_step are comparing it to the libraries listed below
- prime is a framework for efficient, globally distributed training of AI models over the internet.☆656Updated this week
- OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training☆443Updated last month
- Distributed Training Over-The-Internet☆880Updated 2 months ago
- ☆100Updated last month
- DeMo: Decoupled Momentum Optimization☆180Updated 2 months ago
- PyTorch implementation of models from the Zamba2 series.☆176Updated 3 weeks ago
- ☆96Updated 4 months ago
- Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.☆287Updated 3 months ago
- llama.cpp fork with additional SOTA quants and improved performance☆155Updated this week
- Muon optimizer: +~30% sample efficiency with <3% wallclock overhead☆253Updated last week
- GRadient-INformed MoE☆261Updated 4 months ago
- Normalized Transformer (nGPT)☆152Updated 3 months ago
- A Self-adaptation Framework🐙 that adapts LLMs for unseen tasks in real-time!☆948Updated 3 weeks ago
- A comprehensive repository of reasoning tasks for LLMs (and beyond)☆411Updated 4 months ago
- Automating the Search for Artificial Life with Foundation Models!☆374Updated last month
- Alex Krizhevsky's original code from Google Code☆189Updated 8 years ago
- Gradient descent is cool and all, but what if we could delete it?☆103Updated last month
- VPTQ, A Flexible and Extreme low-bit quantization algorithm☆588Updated last week
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"☆215Updated 3 weeks ago
- 1.58 Bit LLM on Apple Silicon using MLX☆184Updated 9 months ago
- Fast parallel LLM inference for MLX☆163Updated 7 months ago
- The history files when recording human interaction while solving ARC tasks☆97Updated this week
- An easy-to-understand framework for LLM samplers that rewind and revise generated tokens☆130Updated this week
- smolLM with Entropix sampler on pytorch☆150Updated 3 months ago