LucasPrietoAl / grokking-at-the-edge-of-numerical-stability
View external linksLinks

☆111

Alternatives and similar repositories for grokking-at-the-edge-of-numerical-stability

Users that are interested in grokking-at-the-edge-of-numerical-stability are comparing it to the libraries listed below

Sorting:

ironjr / grokfast
View on GitHub
Official repository for the paper "Grokfast: Accelerated Grokking by Amplifying Slow Gradients"
☆576Jun 28, 2024Updated last year
HypherX / Evolution-Analysis
View on GitHub
☆25Dec 13, 2024Updated last year
nisten / grokadamw
View on GitHub
new optimizer
☆20Aug 4, 2024Updated last year
OSU-NLP-Group / GrokkedTransformer
View on GitHub
Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'
☆235Jul 19, 2025Updated 6 months ago
IPL-UV / CHG_LLM
View on GitHub
☆11Jun 12, 2024Updated last year
leftthomas / ProxyAnchor
View on GitHub
A PyTorch implementation of Proxy Anchor Loss based on CVPR 2020 paper "Proxy Anchor Loss for Deep Metric Learning"
☆11Jan 16, 2021Updated 5 years ago
LHL3341 / MetaLadder
View on GitHub
MetaLadder: Ascending Mathematical Solution Quality via Analogical-Problem Reasoning Transfer (EMNLP 2025)
☆11Apr 18, 2025Updated 10 months ago
wenquanlu / huginn-latent-cot
View on GitHub
[COLM 2025: 1st Workshop on the Application of LLM Explainability to Reasoning and Planning] Latent Chain-of-Thought? Decoding the Depth-…
☆17Oct 4, 2025Updated 4 months ago
gravesec / chapter_13_figures
View on GitHub
Code for the figures in Chapter 13 of "Reinforcement Learning: An Introduction" by Sutton and Barto
☆14Jul 6, 2023Updated 2 years ago
jaekyeom / drop-bottleneck
View on GitHub
☆15May 15, 2021Updated 4 years ago
BryceZhuo / HybridNorm
View on GitHub
The official implementation of HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization
☆18Mar 7, 2025Updated 11 months ago
nikhilvyas / SOAP_MUON
View on GitHub
Combining SOAP and MUON
☆19Feb 11, 2025Updated last year
LAMDASZ-ML / Self-Backtracking
View on GitHub
☆52Feb 12, 2025Updated last year
samefarrar / entropix_mlx
View on GitHub
Modify Entropy Based Sampling to work with Mac Silicon via MLX
☆49Nov 6, 2024Updated last year
puffnfresh / atsboot
View on GitHub
A tiny 32 bit kernel written in ATS
☆26May 4, 2014Updated 11 years ago
rwightman / tensorflow
View on GitHub
Computation using data flow graphs for scalable machine learning
☆11Jan 13, 2017Updated 9 years ago
ThomasRochefortB / torch-gato
View on GitHub
Pytorch implementation of the Gato paper from Deepmind
☆12Feb 8, 2023Updated 3 years ago
RLG-Leiden / edugym
View on GitHub
☆15Sep 22, 2023Updated 2 years ago
Yiminghh / HOG-Diff
View on GitHub
☆20Jan 31, 2025Updated last year
yuqingd / cusp
View on GitHub
☆15Sep 7, 2022Updated 3 years ago
weixuan-wang123 / SADI
View on GitHub
☆16Sep 1, 2025Updated 5 months ago
nicolacastaman / rh-tamp
View on GitHub
Receding Horizon Task and Motion Planning
☆11Sep 1, 2021Updated 4 years ago
meetdavidwan / clamr
View on GitHub
CLaMR: Contextualized Late-Interaction for Multimodal Content Retrieval
☆23Jun 28, 2025Updated 7 months ago
xjdr-alt / llmri
View on GitHub
look how they massacred my boy
☆63Oct 16, 2024Updated last year
emmyqin / iw_sft
View on GitHub
☆27Jul 18, 2025Updated 7 months ago
sam-paech / antislop-vllm
View on GitHub
☆32Dec 20, 2025Updated last month
seydi1370 / Basis_Functions
View on GitHub
☆17Jun 12, 2024Updated last year
srush / drop7
View on GitHub
☆18Apr 19, 2024Updated last year
belindal / state-tracking
View on GitHub
Code and data for paper "(How) do Language Models Track State?"
☆21Mar 31, 2025Updated 10 months ago
IST-DASLab / QuEST
View on GitHub
Work in progress.
☆79Nov 25, 2025Updated 2 months ago
seal-rg / recurrent-pretraining
View on GitHub
Pretraining and inference code for a large-scale depth-recurrent language model
☆859Dec 29, 2025Updated last month
dylandoblar / noether-networks
View on GitHub
Meta-learning inductive biases in the form of useful conserved quantities.
☆39Nov 19, 2022Updated 3 years ago
THUDM / T1
View on GitHub
RL Scaling and Test-Time Scaling (ICML'25)
☆114Jan 23, 2025Updated last year
RobertCsordas / linear_layer_as_attention
View on GitHub
The official repository for our paper "The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns …
☆16Jun 11, 2025Updated 8 months ago
ulab-uiuc / ToMAP
View on GitHub
Official code repository for the paper "ToMAP: Training Opponent-Aware LLM Persuaders with Theory of Mind"
☆22Sep 25, 2025Updated 4 months ago
GATECH-EIC / LLM4HWDesign_Starting_Toolkit
View on GitHub
LLM4HWDesign Starting Toolkit
☆19Oct 4, 2024Updated last year
bcml-labs / rosa-plus
View on GitHub
ROSA+: RWKV's ROSA implementation with fallback statistical predictor
☆32Oct 13, 2025Updated 4 months ago
EleutherAI / training-jacobian
View on GitHub
☆24Dec 11, 2024Updated last year
Shalev-Lifshitz / MultiAgentVerification
View on GitHub
Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers
☆27Mar 1, 2025Updated 11 months ago

LucasPrietoAl / grokking-at-the-edge-of-numerical-stabilityView external linksLinks

Alternatives and similar repositories for grokking-at-the-edge-of-numerical-stability

LucasPrietoAl / grokking-at-the-edge-of-numerical-stability
View external linksLinks