mcleish7 / arithmeticLinks
Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)
โ198Updated last year
Alternatives and similar repositories for arithmetic
Users that are interested in arithmetic are comparing it to the libraries listed below
Sorting:
- Understand and test language model architectures on synthetic tasks.โ252Updated 3 weeks ago
- A MAD laboratory to improve AI architecture designs ๐งชโ137Updated last year
- The simplest, fastest repository for training/finetuning medium-sized GPTs.โ186Updated 2 weeks ago
- Repository for the paper Stream of Search: Learning to Search in Languageโ153Updated last year
- โ152Updated 5 months ago
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT trainingโ132Updated last year
- Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'โ235Updated 6 months ago
- $100K or 100 Days: Trade-offs when Pre-Training with Academic Resourcesโ150Updated 4 months ago
- โ74Updated last year
- Extract full next-token probabilities via language model APIsโ248Updated last year
- nanoGPT-like codebase for LLM trainingโ113Updated 3 months ago
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"โ247Updated 8 months ago
- Functional Benchmarks and the Reasoning Gapโ89Updated last year
- RuLES: a benchmark for evaluating rule-following in language modelsโ248Updated 11 months ago
- โ57Updated last year
- Language models scale reliably with over-training and on downstream tasksโ99Updated last year
- โ185Updated 2 years ago
- Universal Neurons in GPT2 Language Modelsโ30Updated last year
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.โ175Updated last year
- Code for reproducing our paper "Not All Language Model Features Are Linear"โ83Updated last year
- Open source interpretability artefacts for R1.โ170Updated 9 months ago
- PyTorch library for Active Fine-Tuningโ96Updated 4 months ago
- โ53Updated 2 years ago
- Notebooks accompanying Anthropic's "Toy Models of Superposition" paperโ135Updated 3 years ago
- โ53Updated last year
- โ91Updated last year
- โ214Updated last month
- Can Language Models Solve Olympiad Programming?โ123Updated last year
- Mixture of A Million Expertsโ53Updated last year
- Bootstrapping ARCโ155Updated last year