wbrickner / noise_stepLinks
noise_step: Training in 1.58b With No Gradient Memory
☆219Updated 5 months ago
Alternatives and similar repositories for noise_step
Users that are interested in noise_step are comparing it to the libraries listed below
Sorting:
- prime-rl is a codebase for decentralized async RL training at scale☆318Updated this week
- Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.☆310Updated 7 months ago
- Exploring Applications of GRPO☆230Updated 3 weeks ago
- in this repository, i'm going to implement increasingly complex llm inference optimizations☆58Updated 2 weeks ago
- ☆111Updated 5 months ago
- Getting crystal-like representations with harmonic loss☆187Updated 2 months ago
- SIMD quantization kernels☆70Updated this week
- Simple Transformer in Jax☆137Updated 11 months ago
- NanoGPT-speedrunning for the poor T4 enjoyers☆66Updated last month
- PyTorch implementation of models from the Zamba2 series.☆182Updated 4 months ago
- Inference of Mamba models in pure C☆187Updated last year
- smolLM with Entropix sampler on pytorch☆150Updated 7 months ago
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆154Updated 7 months ago
- Build your own visual reasoning model☆379Updated last week
- ☆95Updated 6 months ago
- Atropos is a Language Model Reinforcement Learning Environments framework for collecting and evaluating LLM trajectories through diverse …☆447Updated this week
- Fast parallel LLM inference for MLX☆189Updated 11 months ago
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…☆333Updated 5 months ago
- On-device intelligence.☆348Updated 2 months ago
- look how they massacred my boy☆63Updated 7 months ago
- Train your own SOTA deductive reasoning model☆93Updated 3 months ago
- DeMo: Decoupled Momentum Optimization☆188Updated 6 months ago
- A graph visualization of attention☆55Updated 2 weeks ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆172Updated 4 months ago
- ComplexTensor: Machine Learning By Bridging Classical and Quantum Computation☆75Updated 6 months ago
- Plotting (entropy, varentropy) for small LMs☆97Updated 2 weeks ago
- An easy-to-understand framework for LLM samplers that rewind and revise generated tokens☆139Updated 3 months ago
- Reasoning Computers. Lambda Calculus, Fully Differentiable. Also Neural Stacks, Queues, Arrays, Lists, Trees, and Latches.☆257Updated 7 months ago
- ☆140Updated 6 months ago
- A tree-based prefix cache library that allows rapid creation of looms: hierarchal branching pathways of LLM generations.☆69Updated 3 months ago