wbrickner / noise_stepLinks
noise_step: Training in 1.58b With No Gradient Memory
☆222Updated 10 months ago
Alternatives and similar repositories for noise_step
Users that are interested in noise_step are comparing it to the libraries listed below
Sorting:
- NanoGPT-speedrunning for the poor T4 enjoyers☆72Updated 6 months ago
- Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.☆327Updated last year
- Inference of Mamba models in pure C☆192Updated last year
- SIMD quantization kernels☆92Updated 2 months ago
- DeMo: Decoupled Momentum Optimization☆197Updated 11 months ago
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)☆107Updated 8 months ago
- look how they massacred my boy☆63Updated last year
- GRadient-INformed MoE☆264Updated last year
- Exploring Applications of GRPO☆248Updated 2 months ago
- OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training☆547Updated 10 months ago
- Simple Transformer in Jax☆139Updated last year
- PyTorch implementation of models from the Zamba2 series.☆185Updated 9 months ago
- A really tiny autograd engine☆97Updated 5 months ago
- rl from zero pretrain, can it be done? yes.☆280Updated last month
- smol models are fun too☆92Updated last year
- Normalized Transformer (nGPT)☆192Updated last year
- RWKV in nanoGPT style☆195Updated last year
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆154Updated last year
- Simple & Scalable Pretraining for Neural Architecture Research☆300Updated 3 weeks ago
- ☆148Updated 11 months ago
- ☆136Updated last year
- ☆126Updated 10 months ago
- 1.58 Bit LLM on Apple Silicon using MLX☆225Updated last year
- Gradient descent is cool and all, but what if we could delete it?☆104Updated 3 months ago
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"☆249Updated 9 months ago
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)☆66Updated 7 months ago
- Fast parallel LLM inference for MLX☆232Updated last year
- Lightweight toolkit package to train and fine-tune 1.58bit Language models☆98Updated 6 months ago
- Getting crystal-like representations with harmonic loss☆192Updated 7 months ago
- Quantized LLM training in pure CUDA/C++.☆216Updated this week