mcleish7 / arithmeticLinks
Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)
โ190Updated last year
Alternatives and similar repositories for arithmetic
Users that are interested in arithmetic are comparing it to the libraries listed below
Sorting:
- A MAD laboratory to improve AI architecture designs ๐งชโ120Updated 6 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.โ134Updated this week
- Understand and test language model architectures on synthetic tasks.โ217Updated last week
- Language models scale reliably with over-training and on downstream tasksโ97Updated last year
- Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'โ218Updated 6 months ago
- โ180Updated last year
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"โ235Updated 2 weeks ago
- Repository for the paper Stream of Search: Learning to Search in Languageโ148Updated 4 months ago
- โ180Updated 2 months ago
- โ134Updated 2 months ago
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT trainingโ127Updated last year
- nanoGPT-like codebase for LLM trainingโ98Updated last month
- โ65Updated last year
- โ190Updated 2 weeks ago
- Normalized Transformer (nGPT)โ183Updated 7 months ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.โ173Updated 5 months ago
- Code for reproducing our paper "Not All Language Model Features Are Linear"โ75Updated 6 months ago
- Extract full next-token probabilities via language model APIsโ247Updated last year
- Bootstrapping ARCโ127Updated 7 months ago
- EvaByte: Efficient Byte-level Language Models at Scaleโ101Updated 2 months ago
- โ53Updated last year
- Implementation of ๐ฅฅ Coconut, Chain of Continuous Thought, in Pytorchโ174Updated 5 months ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clustersโ126Updated 6 months ago
- Code for the paper "The Impact of Positional Encoding on Length Generalization in Transformers", NeurIPS 2023โ136Updated last year
- Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models โฆโ184Updated last week
- PyTorch library for Active Fine-Tuningโ80Updated 4 months ago
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmindโ127Updated 9 months ago
- Create feature-centric and prompt-centric visualizations for sparse autoencoders (like those from Anthropic's published research).โ202Updated 6 months ago
- โ78Updated 11 months ago
- โ53Updated last year