wbrickner / noise_stepLinks
noise_step: Training in 1.58b With No Gradient Memory
☆220Updated last year
Alternatives and similar repositories for noise_step
Users that are interested in noise_step are comparing it to the libraries listed below
Sorting:
- Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.☆344Updated last year
- ☆131Updated last year
- NanoGPT-speedrunning for the poor T4 enjoyers☆73Updated 8 months ago
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)☆109Updated 9 months ago
- GRadient-INformed MoE☆265Updated last year
- look how they massacred my boy☆63Updated last year
- ☆148Updated last year
- Gradient descent is cool and all, but what if we could delete it?☆104Updated 4 months ago
- Our solution for the arc challenge 2024☆186Updated 6 months ago
- Simple & Scalable Pretraining for Neural Architecture Research☆305Updated 3 weeks ago
- Getting crystal-like representations with harmonic loss☆194Updated 8 months ago
- SIMD quantization kernels☆93Updated 3 months ago
- RWKV in nanoGPT style☆197Updated last year
- Exploring Applications of GRPO☆251Updated 4 months ago
- Plotting (entropy, varentropy) for small LMs☆99Updated 7 months ago
- DeMo: Decoupled Momentum Optimization☆198Updated last year
- Beyond Language Models: Byte Models are Digital World Simulators☆333Updated last year
- ☆109Updated last year
- Alex Krizhevsky's original code from Google Code☆197Updated 9 years ago
- Open-source release accompanying Gao et al. 2025☆471Updated 2 weeks ago
- RWKV-7: Surpassing GPT☆102Updated last year
- Reasoning Computers. Lambda Calculus, Fully Differentiable. Also Neural Stacks, Queues, Arrays, Lists, Trees, and Latches.☆284Updated last year
- prime is a framework for efficient, globally distributed training of AI models over the internet.☆849Updated last month
- ☆137Updated last year
- ☆211Updated 4 months ago
- Simple Transformer in Jax☆141Updated last year
- A really tiny autograd engine☆96Updated 7 months ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆174Updated 11 months ago
- smol models are fun too☆92Updated last year
- smolLM with Entropix sampler on pytorch☆149Updated last year