wbrickner / noise_stepLinks

noise_step: Training in 1.58b With No Gradient Memory

☆222

Alternatives and similar repositories for noise_step

Users that are interested in noise_step are comparing it to the libraries listed below

Sorting:

VatsaDev / NanoPoor
NanoGPT-speedrunning for the poor T4 enjoyers
☆72Updated 6 months ago
SakanaAI / evo-memory
Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.
☆327Updated last year
kroggen / mamba.c
Inference of Mamba models in pure C
☆192Updated last year
PrimeIntellect-ai / pi-quant
SIMD quantization kernels
☆92Updated 2 months ago
bloc97 / DeMo
DeMo: Decoupled Momentum Optimization
☆197Updated 11 months ago
JoeLi12345 / nGPT
an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)
☆107Updated 8 months ago
xjdr-alt / llmri
look how they massacred my boy
☆63Updated last year
microsoft / GRIN-MoE
GRadient-INformed MoE
☆264Updated last year
brendanhogan / DeepSeekRL-Extended
Exploring Applications of GRPO
☆248Updated 2 months ago
PrimeIntellect-ai / OpenDiloco
OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training
☆547Updated 10 months ago
xjdr-alt / simple_transformer
Simple Transformer in Jax
☆139Updated last year
Zyphra / Zamba2
PyTorch implementation of models from the Zamba2 series.
☆185Updated 9 months ago
joey00072 / Tinytorch
A really tiny autograd engine
☆97Updated 5 months ago
tokenbender / avataRL
rl from zero pretrain, can it be done? yes.
☆280Updated last month
xjdr-alt / entropix-local
smol models are fun too
☆92Updated last year
NVIDIA / ngpt
Normalized Transformer (nGPT)
☆192Updated last year
BlinkDL / nanoRWKV
RWKV in nanoGPT style
☆195Updated last year
astramind-ai / BitMat
An efficent implementation of the method proposed in "The Era of 1-bit LLMs"
☆154Updated last year
microsoft / ArchScale
Simple & Scalable Pretraining for Neural Architecture Research
☆300Updated 3 weeks ago
Jellyfish042 / Sudoku-RWKV
☆148Updated 11 months ago
QuixiAI / grokadamw
☆136Updated last year
jerber / lang-jepa
☆126Updated 10 months ago
exo-explore / mlx-bitnet
1.58 Bit LLM on Apple Silicon using MLX
☆225Updated last year
haraschax / nograd
Gradient descent is cool and all, but what if we could delete it?
☆104Updated 3 months ago
HazyResearch / lolcats
Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"
☆249Updated 9 months ago
cloneofsimo / ptx-tutorial-by-aislop
PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)
☆66Updated 7 months ago
willccbb / mlx_parallm
Fast parallel LLM inference for MLX
☆232Updated last year
tiiuae / onebitllms
Lightweight toolkit package to train and fine-tune 1.58bit Language models
☆98Updated 6 months ago
KindXiaoming / grow-crystals
Getting crystal-like representations with harmonic loss
☆192Updated 7 months ago
IST-DASLab / llmq
Quantized LLM training in pure CUDA/C++.
☆216Updated this week