wbrickner / noise_stepLinks
noise_step: Training in 1.58b With No Gradient Memory
☆221Updated 8 months ago
Alternatives and similar repositories for noise_step
Users that are interested in noise_step are comparing it to the libraries listed below
Sorting:
- NanoGPT-speedrunning for the poor T4 enjoyers☆71Updated 4 months ago
- SIMD quantization kernels☆87Updated last week
- Decentralized RL Training at Scale☆592Updated this week
- Gradient descent is cool and all, but what if we could delete it?☆104Updated 3 weeks ago
- ☆120Updated 8 months ago
- ☆145Updated 9 months ago
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)☆105Updated 6 months ago
- Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.☆322Updated 10 months ago
- Getting crystal-like representations with harmonic loss☆194Updated 5 months ago
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…☆343Updated 9 months ago
- Normalized Transformer (nGPT)☆188Updated 10 months ago
- DeMo: Decoupled Momentum Optimization☆190Updated 9 months ago
- look how they massacred my boy☆64Updated 11 months ago
- Our solution for the arc challenge 2024☆176Updated 3 months ago
- Reasoning Computers. Lambda Calculus, Fully Differentiable. Also Neural Stacks, Queues, Arrays, Lists, Trees, and Latches.☆274Updated 10 months ago
- Micro Llama is a small Llama based model with 300M parameters trained from scratch with $500 budget☆161Updated last month
- PyTorch implementation of models from the Zamba2 series.☆185Updated 7 months ago
- Simple & Scalable Pretraining for Neural Architecture Research☆291Updated 3 weeks ago
- Simple Transformer in Jax☆139Updated last year
- smolLM with Entropix sampler on pytorch☆150Updated 10 months ago
- rl from zero pretrain, can it be done? yes.☆268Updated 3 weeks ago
- 1.58 Bit LLM on Apple Silicon using MLX☆223Updated last year
- GRadient-INformed MoE☆264Updated 11 months ago
- Exploring Applications of GRPO☆249Updated 3 weeks ago
- Inference of Mamba models in pure C☆191Updated last year
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)☆66Updated 5 months ago
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆155Updated 11 months ago
- Inference RWKV v7 in pure C.☆38Updated 3 weeks ago
- Plotting (entropy, varentropy) for small LMs☆98Updated 3 months ago
- Fast parallel LLM inference for MLX☆217Updated last year