☆115Jul 23, 2025Updated 8 months ago
Alternatives and similar repositories for grokking-at-the-edge-of-numerical-stability
Users that are interested in grokking-at-the-edge-of-numerical-stability are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official repository for the paper "Grokfast: Accelerated Grokking by Amplifying Slow Gradients"☆578Jun 28, 2024Updated last year
- Combining SOAP and MUON☆19Feb 11, 2025Updated last year
- ☆24Dec 11, 2024Updated last year
- The NEKO Project is an open source effort to build a model of equivalent scale and capability as that reported in DeepMind’s 2022 Paper, …☆10Sep 2, 2023Updated 2 years ago
- ☆25Dec 13, 2024Updated last year
- The official implementation of HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization☆18Mar 7, 2025Updated last year
- Modify Entropy Based Sampling to work with Mac Silicon via MLX☆49Nov 6, 2024Updated last year
- Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'☆234Jul 19, 2025Updated 8 months ago
- look how they massacred my boy☆63Oct 16, 2024Updated last year
- ☆24Dec 9, 2024Updated last year
- Code for☆28Dec 16, 2024Updated last year
- SETOL: SemiEmpirical Theory of (Deep) Learning☆29Jul 23, 2025Updated 8 months ago
- Github Repository for the HOI4 ULTRA Project.☆11Updated this week
- ☆16Sep 1, 2025Updated 6 months ago
- ☆23Nov 6, 2022Updated 3 years ago
- Code accompanying the paper "Generalized Interpolating Discrete Diffusion"☆115Jun 9, 2025Updated 9 months ago
- ☆21Jan 31, 2025Updated last year
- FC-KAN: Function Combinations in Kolmogorov-Arnold Networks☆37Updated this week
- PyTorch implementation of the Mamba-3 architecture☆73Mar 18, 2026Updated last week
- ☆120Updated this week
- Pretraining and inference code for a large-scale depth-recurrent language model☆865Dec 29, 2025Updated 2 months ago
- [COLM 2025: 1st Workshop on the Application of LLM Explainability to Reasoning and Planning] Latent Chain-of-Thought? Decoding the Depth-…☆17Oct 4, 2025Updated 5 months ago
- Code for "What really matters in matrix-whitening optimizers?"☆23Oct 31, 2025Updated 4 months ago
- Code for "Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining"☆27Oct 14, 2025Updated 5 months ago
- Attention Kernels for Symmetric Power Transformers☆129Sep 25, 2025Updated 6 months ago
- Efficient PScan implementation in PyTorch☆17Jan 2, 2024Updated 2 years ago
- MatFormer repo☆72Dec 9, 2024Updated last year
- ☆11Jun 12, 2024Updated last year
- ☆17Jun 12, 2024Updated last year
- MultiscaleGraphSignalTransforms.jl is a collection of software tools written in the Julia programming language for graph signal processin…☆12Mar 15, 2026Updated last week
- [NeurIPS 2025] Official code for "Tropical Attention: Neural Algorithmic Reasoning for Combinatorial Algorithms"☆27Oct 23, 2025Updated 5 months ago
- A State-Space Model with Rational Transfer Function Representation.☆83May 17, 2024Updated last year
- GeoZarr extension for OpenLayers☆12Jun 27, 2024Updated last year
- slowly building a set of infinite riddle generators for data-hungry methods☆14Nov 15, 2022Updated 3 years ago
- Linear Attention for Efficient Bidirectional Sequence Modeling☆16May 13, 2025Updated 10 months ago
- ☆47May 20, 2025Updated 10 months ago
- The official repository for our paper "The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns …☆16Jun 11, 2025Updated 9 months ago
- ☆16Feb 22, 2025Updated last year
- A Wikipedia-based summarization dataset☆14Mar 27, 2023Updated 2 years ago