wbrickner / noise_stepLinks
noise_step: Training in 1.58b With No Gradient Memory
☆220Updated last year
Alternatives and similar repositories for noise_step
Users that are interested in noise_step are comparing it to the libraries listed below
Sorting:
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)☆109Updated 10 months ago
- NanoGPT-speedrunning for the poor T4 enjoyers☆73Updated 8 months ago
- SIMD quantization kernels☆93Updated 4 months ago
- ☆148Updated last year
- DeMo: Decoupled Momentum Optimization☆198Updated last year
- Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.☆345Updated last year
- ☆133Updated last year
- look how they massacred my boy☆63Updated last year
- Exploring Applications of GRPO☆252Updated 4 months ago
- RWKV in nanoGPT style☆197Updated last year
- Simple Transformer in Jax☆142Updated last year
- Lightweight toolkit package to train and fine-tune 1.58bit Language models☆107Updated 8 months ago
- Normalized Transformer (nGPT)☆196Updated last year
- RWKV-7: Surpassing GPT☆103Updated last year
- PyTorch implementation of models from the Zamba2 series.☆186Updated 11 months ago
- Getting crystal-like representations with harmonic loss☆195Updated 9 months ago
- Alex Krizhevsky's original code from Google Code☆198Updated 9 years ago
- MoE training for Me and You and maybe other people☆319Updated 2 weeks ago
- Open-source release accompanying Gao et al. 2025☆490Updated last month
- Inference of Mamba models in pure C☆195Updated last year
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆155Updated last year
- Plotting (entropy, varentropy) for small LMs☆99Updated 8 months ago
- smol models are fun too☆93Updated last year
- The Automated LLM Speedrunning Benchmark measures how well LLM agents can reproduce previous innovations and discover new ones in languag…☆126Updated 3 months ago
- Simple & Scalable Pretraining for Neural Architecture Research☆306Updated last month
- Reasoning Computers. Lambda Calculus, Fully Differentiable. Also Neural Stacks, Queues, Arrays, Lists, Trees, and Latches.☆285Updated last year
- rl from zero pretrain, can it be done? yes.☆286Updated 3 months ago
- ☆137Updated last year
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆175Updated last year
- smolLM with Entropix sampler on pytorch☆149Updated last year