OSU-NLP-Group / GrokkedTransformerLinks

Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'

☆225

Alternatives and similar repositories for GrokkedTransformer

Users that are interested in GrokkedTransformer are comparing it to the libraries listed below

Sorting:

kanishkg / stream-of-search
Repository for the paper Stream of Search: Learning to Search in Language
☆149Updated 6 months ago
da03 / Internalize_CoT_Step_by_Step
☆187Updated 3 months ago
casper-hansen / OpenCoconut
OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.
☆173Updated 6 months ago
mcleish7 / arithmetic
Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)
☆190Updated last year
ekinakyurek / marc
Public repository for "The Surprising Effectiveness of Test-Time Training for Abstract Reasoning"
☆321Updated 8 months ago
LeonGuertler / TextArena
A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning
☆217Updated last week
SalesforceAIResearch / LaTRO
☆117Updated 5 months ago
ConsequentAI / fneval
Functional Benchmarks and the Reasoning Gap
☆88Updated 10 months ago
da03 / implicit_chain_of_thought
☆135Updated 8 months ago
OpenEvaByte / evabyte
EvaByte: Efficient Byte-level Language Models at Scale
☆103Updated 3 months ago
RulinShao / retrieval-scaling
Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".
☆207Updated last month
ScalingIntelligence / Archon
Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.
☆174Updated 4 months ago
WildEval / ZeroEval
A simple unified framework for evaluating LLMs
☆229Updated 3 months ago
goodfire-ai / r1-interpretability
Open source interpretability artefacts for R1.
☆157Updated 3 months ago
hughbzhang / o1_inference_scaling_laws
Replicating O1 inference-time scaling laws
☆89Updated 8 months ago
allenai / WildBench
Benchmarking LLMs with Challenging Tasks from Real Users
☆233Updated 9 months ago
lucidrains / coconut-pytorch
Implementation of 🥥 Coconut, Chain of Continuous Thought, in Pytorch
☆178Updated last month
princeton-nlp / USACO
Can Language Models Solve Olympiad Programming?
☆119Updated 6 months ago
Mohammadjafari80 / GSM8K-RLVR
A simplified implementation for experimenting with RLVR on GSM8K, This repository provides a starting point for exploring reasoning.
☆117Updated 5 months ago
allenai / olmes
Reproducible, flexible LLM evaluations
☆226Updated 3 weeks ago
architsharma97 / dpo-rlaif
☆99Updated last year
JacobPfau / fillerTokens
☆67Updated last year
ScalingIntelligence / large_language_monkeys
☆101Updated 10 months ago
SALT-NLP / demonstrated-feedback
☆124Updated 10 months ago
expz / quiet-star
Implementation of the Quiet-STAR paper (https://arxiv.org/pdf/2403.09629.pdf)
☆54Updated 11 months ago
dwzhu-pku / PoSE
Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)
☆205Updated last year
booydar / babilong
BABILong is a benchmark for LLM evaluation using the needle-in-a-haystack approach.
☆208Updated 2 months ago
JinjieNi / MixEval
The official evaluation suite and dynamic data release for MixEval.
☆242Updated 8 months ago
Yu-Fangxu / FoR
[ICML 2025] Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples
☆101Updated last week
felipemaiapolo / tinyBenchmarks
Evaluating LLMs with fewer examples
☆160Updated last year