wbrickner / noise_stepLinks
noise_step: Training in 1.58b With No Gradient Memory
☆220Updated last year
Alternatives and similar repositories for noise_step
Users that are interested in noise_step are comparing it to the libraries listed below
Sorting:
- Getting crystal-like representations with harmonic loss☆195Updated 10 months ago
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)☆110Updated 11 months ago
- Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.☆347Updated last year
- RWKV in nanoGPT style☆197Updated last year
- ☆111Updated last year
- SIMD quantization kernels☆94Updated 5 months ago
- look how they massacred my boy☆63Updated last year
- ☆134Updated last year
- DeMo: Decoupled Momentum Optimization☆198Updated last year
- Open-source release accompanying Gao et al. 2025☆501Updated 2 months ago
- ☆147Updated last year
- Gradient descent is cool and all, but what if we could delete it?☆106Updated 5 months ago
- Exploring Applications of GRPO☆251Updated 5 months ago
- NanoGPT-speedrunning for the poor T4 enjoyers☆73Updated 9 months ago
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…☆371Updated last year
- Beyond Language Models: Byte Models are Digital World Simulators☆334Updated last year
- ☆137Updated last year
- PyTorch implementation of models from the Zamba2 series.☆186Updated last year
- Normalized Transformer (nGPT)☆198Updated last year
- Simple & Scalable Pretraining for Neural Architecture Research☆308Updated 2 months ago
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆155Updated last year
- GRadient-INformed MoE☆264Updated last year
- ☆215Updated last month
- Reverse Engineering Gemma 3n: Google's New Edge-Optimized Language Model☆263Updated 8 months ago
- Alex Krizhevsky's original code from Google Code☆199Updated 9 years ago
- Our solution for the arc challenge 2024☆188Updated 7 months ago
- A graph visualization of attention☆57Updated 8 months ago
- Inference of Mamba and Mamba2 models in pure C☆196Updated 2 weeks ago
- Inference RWKV v7 in pure C.☆44Updated 4 months ago
- OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training☆562Updated last year