Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'
☆240Jul 19, 2025Updated 11 months ago
Alternatives and similar repositories for GrokkedTransformer
Users that are interested in GrokkedTransformer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆19Mar 25, 2025Updated last year
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆54Feb 23, 2024Updated 2 years ago
- [ICLR'24 Spotlight] "Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts"☆83Apr 12, 2024Updated 2 years ago
- Official repository for the paper "Grokfast: Accelerated Grokking by Amplifying Slow Gradients"☆582Jun 28, 2024Updated 2 years ago
- [NAACL'25] "Revealing the Barriers of Language Agents in Planning"☆13Jun 22, 2025Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- MUFFIN: Curating Multi-Faceted Instructions for Improving Instruction-Following☆16Oct 31, 2024Updated last year
- [EMNLP 2024 Tutorial] Language Agents: Foundations, Prospects, and Risks☆10Nov 27, 2024Updated last year
- Exploring the Limitations of Large Language Models on Multi-Hop Queries☆33Mar 2, 2025Updated last year
- Code for RATIONALYST: Pre-training Process-Supervision for Improving Reasoning https://arxiv.org/pdf/2410.01044☆36Oct 3, 2024Updated last year
- The code for creating the iGSM datasets in papers "Physics of Language Models Part 2.1, Grade-School Math and the Hidden Reasoning Proces…☆88Jan 12, 2025Updated last year
- Pretraining and inference code for a large-scale depth-recurrent language model☆894Dec 29, 2025Updated 6 months ago
- Code for reproducing our paper "Not All Language Model Features Are Linear"☆87Nov 27, 2024Updated last year
- ☆21May 14, 2026Updated last month
- Code for my NeurIPS 2024 ATTRIB paper titled "Attribution Patching Outperforms Automated Circuit Discovery"☆47May 31, 2024Updated 2 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- ☆59Nov 19, 2024Updated last year
- Providing the answer to "How to do patching on all available SAEs on GPT-2?". It is an official repository of the implementation of the p…☆13Jan 26, 2025Updated last year
- The official repository for our paper "The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns …☆16Jun 11, 2025Updated last year
- ☆119Jul 23, 2025Updated 11 months ago
- [EMNLP 2024] Tree of Problems: Improving structured problem solving with compositionality☆20Mar 4, 2025Updated last year
- ☆33Jan 7, 2025Updated last year
- Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]☆146Oct 27, 2024Updated last year
- Official PyTorch implementation and models for paper "Diffusion Beats Autoregressive in Data-Constrained Settings". We find diffusion mod…☆127Jan 10, 2026Updated 5 months ago
- [NeurIPS 2025] What Makes a Reward Model a Good Teacher? An Optimization Perspective☆43Sep 18, 2025Updated 9 months ago
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- Codes and Data for ACL 2024 Paper "Faithful Logical Reasoning via Symbolic Chain-of-Thought".☆206Jan 29, 2026Updated 5 months ago
- ☆1,034Dec 17, 2024Updated last year
- Official code for paper Understanding the Reasoning Ability of Language Models From the Perspective of Reasoning Paths Aggregation☆21Feb 29, 2024Updated 2 years ago
- A library for efficient patching and automatic circuit discovery.☆97Dec 31, 2025Updated 6 months ago
- Implementation of OpenAI's 'Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets' paper.☆44May 2, 2026Updated 2 months ago
- ☆15Feb 21, 2024Updated 2 years ago
- ☆93Aug 18, 2024Updated last year
- ☆131Oct 1, 2024Updated last year
- Code for reproducing the ACL'23 paper: Don't Generate, Discriminate: A Proposal for Grounding Language Models to Real-World Environments☆78May 17, 2025Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Training Large Language Model to Reason in a Continuous Latent Space☆1,644Jun 10, 2026Updated 3 weeks ago
- The source code for running LLMs on the AAAR-1.0 benchmark.☆20Apr 5, 2025Updated last year
- Engine for collecting, uploading, and downloading model activations☆29Apr 2, 2025Updated last year
- Code for 'The Geometry of Categorical and Hierarchical Concepts in Large Language Models' (ICLR 2025, Oral)☆115Feb 11, 2025Updated last year
- [NeurIPS 2024] Goldfish Loss: Mitigating Memorization in Generative LLMs☆98Nov 17, 2024Updated last year
- Code for Quiet-STaR☆738Aug 21, 2024Updated last year
- MetaLadder: Ascending Mathematical Solution Quality via Analogical-Problem Reasoning Transfer (EMNLP 2025)☆12Apr 18, 2025Updated last year