The code for creating the iGSM datasets in papers "Physics of Language Models Part 2.1, Grade-School Math and the Hidden Reasoning Process" (arxiv 2407.20311) and "Physics of Language Models Part 2.2, How to Learn From Mistakes on Grade-School Math Problems" (arxiv 2408.16293)
☆86Jan 12, 2025Updated last year
Alternatives and similar repositories for iGSM
Users that are interested in iGSM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [NeurIPS 2024] "Mind the Gap between Prototypes and Images in Cross-domain Finetuning"☆11Nov 15, 2024Updated last year
- ☆20Nov 3, 2024Updated last year
- Physics of Language Models: Part 4.2, Canon Layers at Scale where Synthetic Pretraining Resonates in Reality☆345Updated this week
- This is the source code of FUSION, a safety-aware causal representation for generalizable driving agents.☆26Oct 23, 2024Updated last year
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.☆92Jul 17, 2025Updated 10 months ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- The evaluation framework for training-free sparse attention in LLMs☆122Jan 27, 2026Updated 3 months ago
- ☆60Sep 17, 2025Updated 8 months ago
- M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models☆48Jul 17, 2025Updated 10 months ago
- 6,080-param transformer achieving 100% accuracy on 10-digit addition. Trained from scratch in 10 minutes.☆22Feb 19, 2026Updated 3 months ago
- [NeurIPS 2024] "Discovery of the Hidden World with Large Language Models"☆31Dec 2, 2024Updated last year
- Source-to-Source Debuggable Derivatives in Pure Python☆15Jan 23, 2024Updated 2 years ago
- Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'☆238Jul 19, 2025Updated 10 months ago
- Functional Optimal Transport: Map Estimation and Domain Adaptation for Functional data☆27Jun 7, 2021Updated 4 years ago
- Preprint: Asymmetry in Low-Rank Adapters of Foundation Models☆39Feb 27, 2024Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆16Jun 12, 2024Updated last year
- [ICLR2026] The official repository for the CodeGym project: "Generalizable End-to-End Tool-Use RL with Synthetic CodeGym"☆31Oct 14, 2025Updated 7 months ago
- ☆26Feb 20, 2026Updated 3 months ago
- Spectral Sphere Optimizer☆116Mar 23, 2026Updated 2 months ago
- ☆152Apr 8, 2026Updated last month
- The Full Spectrum of Deepnet Hessians at Scale: Dynamics with SGD Training and Sample Size☆19May 19, 2019Updated 7 years ago
- ☆61Aug 5, 2025Updated 9 months ago
- ☆38Feb 26, 2024Updated 2 years ago
- Benchmarking Optimizers for LLM Pretraining☆58May 3, 2026Updated 3 weeks ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Code for the paper "A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation Analysis"☆20Jun 12, 2025Updated 11 months ago
- toy reproduction of Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts☆31Sep 1, 2024Updated last year
- A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. 🧮✨☆276Apr 26, 2024Updated 2 years ago
- Evaluating Durability: Benchmark Insights into Multimodal Watermarking☆12Jun 7, 2024Updated last year
- ☆244Aug 14, 2024Updated last year
- Code for "Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining"☆28Oct 14, 2025Updated 7 months ago
- LoFiT: Localized Fine-tuning on LLM Representations☆45Jan 15, 2025Updated last year
- ☆220Dec 23, 2025Updated 5 months ago
- ☆25Oct 20, 2022Updated 3 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Experiments for "A Closer Look at In-Context Learning under Distribution Shifts"☆18May 29, 2023Updated 2 years ago
- Code for "What really matters in matrix-whitening optimizers?"☆24Oct 31, 2025Updated 6 months ago
- Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence☆64Nov 11, 2025Updated 6 months ago
- [ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)☆160Jul 8, 2025Updated 10 months ago
- Feasibility Consistent Representation Learning for Safe Reinforcement Learning (ICML 2024). Current SOTA model-free safe RL algorithm on …☆16Jul 12, 2024Updated last year
- Adversarially Robust Generalization Just Requires More Unlabeled Data☆11Aug 8, 2019Updated 6 years ago
- flex-block-attn: an efficient block sparse attention computation library☆131Dec 26, 2025Updated 5 months ago