mcleish7 / arithmeticLinks

Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)

☆190

Alternatives and similar repositories for arithmetic

Users that are interested in arithmetic are comparing it to the libraries listed below

Sorting:

EleutherAI / nanoGPT-mup
The simplest, fastest repository for training/finetuning medium-sized GPTs.
☆149Updated last month
google-deepmind / mishax
☆134Updated 4 months ago
HazyResearch / zoology
Understand and test language model architectures on synthetic tasks.
☆221Updated 3 weeks ago
athms / mad-lab
A MAD laboratory to improve AI architecture designs 🧪
☆123Updated 7 months ago
cloneofsimo / min-max-gpt
Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training
☆130Updated last year
lucidrains / llama-qrlhf
Implementation of the Llama architecture with RLHF + Q-learning
☆166Updated 6 months ago
OSU-NLP-Group / GrokkedTransformer
Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'
☆225Updated 2 weeks ago
epfml / llm-baselines
nanoGPT-like codebase for LLM training
☆102Updated 2 months ago
HazyResearch / based
Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"
☆238Updated last month
JoshEngels / MultiDimensionalFeatures
Code for reproducing our paper "Not All Language Model Features Are Linear"
☆77Updated 8 months ago
apoorvkh / academic-pretraining
$100K or 100 Days: Trade-offs when Pre-Training with Academic Resources
☆143Updated 2 months ago
goodfire-ai / r1-interpretability
Open source interpretability artefacts for R1.
☆157Updated 3 months ago
epfml / DenseFormer
☆81Updated last year
justinchiu / openlogprobs
Extract full next-token probabilities via language model APIs
☆247Updated last year
mlfoundations / scaling
Language models scale reliably with over-training and on downstream tasks
☆97Updated last year
casper-hansen / OpenCoconut
OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.
☆173Updated 6 months ago
RobertCsordas / moeut
☆83Updated 11 months ago
tilde-research / sieve
Applying SAEs for fine-grained control
☆23Updated 7 months ago
NVIDIA / ngpt
Normalized Transformer (nGPT)
☆185Updated 8 months ago
kanishkg / stream-of-search
Repository for the paper Stream of Search: Learning to Search in Language
☆149Updated 6 months ago
koayon / awesome-adaptive-computation
A curated reading list of research in Adaptive Computation, Inference-Time Computation & Mixture of Experts (MoE).
☆149Updated 7 months ago
ConsequentAI / fneval
Functional Benchmarks and the Reasoning Gap
☆88Updated 10 months ago
EleutherAI / delphi
Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …
☆200Updated this week
lucidrains / CALM-pytorch
Implementation of CALM from the paper "LLM Augmented LLMs: Expanding Capabilities through Composition", out of Google Deepmind
☆177Updated 10 months ago
cloneofsimo / min-fsdp
☆83Updated last year
llm-random / llm-random
☆192Updated last week
OpenEvaByte / evabyte
EvaByte: Efficient Byte-level Language Models at Scale
☆103Updated 3 months ago
LucasPrietoAl / grokking-at-the-edge-of-numerical-stability
☆100Updated last week
normster / llm_rules
RuLES: a benchmark for evaluating rule-following in language models
☆227Updated 5 months ago
da03 / Internalize_CoT_Step_by_Step
☆187Updated 3 months ago