LucasPrietoAl / grokking-at-the-edge-of-numerical-stabilityView external linksLinks
☆111Jul 23, 2025Updated 6 months ago
Alternatives and similar repositories for grokking-at-the-edge-of-numerical-stability
Users that are interested in grokking-at-the-edge-of-numerical-stability are comparing it to the libraries listed below
Sorting:
- Official repository for the paper "Grokfast: Accelerated Grokking by Amplifying Slow Gradients"☆576Jun 28, 2024Updated last year
- ☆25Dec 13, 2024Updated last year
- new optimizer☆20Aug 4, 2024Updated last year
- Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'☆235Jul 19, 2025Updated 6 months ago
- ☆11Jun 12, 2024Updated last year
- A PyTorch implementation of Proxy Anchor Loss based on CVPR 2020 paper "Proxy Anchor Loss for Deep Metric Learning"☆11Jan 16, 2021Updated 5 years ago
- MetaLadder: Ascending Mathematical Solution Quality via Analogical-Problem Reasoning Transfer (EMNLP 2025)☆11Apr 18, 2025Updated 10 months ago
- [COLM 2025: 1st Workshop on the Application of LLM Explainability to Reasoning and Planning] Latent Chain-of-Thought? Decoding the Depth-…☆17Oct 4, 2025Updated 4 months ago
- Code for the figures in Chapter 13 of "Reinforcement Learning: An Introduction" by Sutton and Barto☆14Jul 6, 2023Updated 2 years ago
- ☆15May 15, 2021Updated 4 years ago
- The official implementation of HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization☆18Mar 7, 2025Updated 11 months ago
- Combining SOAP and MUON☆19Feb 11, 2025Updated last year
- ☆52Feb 12, 2025Updated last year
- Modify Entropy Based Sampling to work with Mac Silicon via MLX☆49Nov 6, 2024Updated last year
- A tiny 32 bit kernel written in ATS☆26May 4, 2014Updated 11 years ago
- Computation using data flow graphs for scalable machine learning☆11Jan 13, 2017Updated 9 years ago
- Pytorch implementation of the Gato paper from Deepmind☆12Feb 8, 2023Updated 3 years ago
- ☆15Sep 22, 2023Updated 2 years ago
- ☆20Jan 31, 2025Updated last year
- ☆15Sep 7, 2022Updated 3 years ago
- ☆16Sep 1, 2025Updated 5 months ago
- Receding Horizon Task and Motion Planning☆11Sep 1, 2021Updated 4 years ago
- CLaMR: Contextualized Late-Interaction for Multimodal Content Retrieval☆23Jun 28, 2025Updated 7 months ago
- look how they massacred my boy☆63Oct 16, 2024Updated last year
- ☆27Jul 18, 2025Updated 7 months ago
- ☆32Dec 20, 2025Updated last month
- ☆17Jun 12, 2024Updated last year
- ☆18Apr 19, 2024Updated last year
- Code and data for paper "(How) do Language Models Track State?"☆21Mar 31, 2025Updated 10 months ago
- Work in progress.☆79Nov 25, 2025Updated 2 months ago
- Pretraining and inference code for a large-scale depth-recurrent language model☆859Dec 29, 2025Updated last month
- Meta-learning inductive biases in the form of useful conserved quantities.☆39Nov 19, 2022Updated 3 years ago
- RL Scaling and Test-Time Scaling (ICML'25)☆114Jan 23, 2025Updated last year
- The official repository for our paper "The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns …☆16Jun 11, 2025Updated 8 months ago
- Official code repository for the paper "ToMAP: Training Opponent-Aware LLM Persuaders with Theory of Mind"☆22Sep 25, 2025Updated 4 months ago
- LLM4HWDesign Starting Toolkit☆19Oct 4, 2024Updated last year
- ROSA+: RWKV's ROSA implementation with fallback statistical predictor☆32Oct 13, 2025Updated 4 months ago
- ☆24Dec 11, 2024Updated last year
- Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers☆27Mar 1, 2025Updated 11 months ago