thomasahle / arithmetic-transformer
Teaching Addition to Small Transformers
☆14Updated 11 months ago
Related projects ⓘ
Alternatives and complementary repositories for arithmetic-transformer
- gzip Predicts Data-dependent Scaling Laws☆32Updated 5 months ago
- Transformer with Mu-Parameterization, implemented in Jax/Flax. Supports FSDP on TPU pods.☆29Updated last week
- Jax like function transformation engine but micro, microjax☆26Updated 2 weeks ago
- RWKV model implementation☆38Updated last year
- A place to store reusable transformer components of my own creation or found on the interwebs☆43Updated last week
- ☆43Updated 2 months ago
- ☆48Updated this week
- Automatically take good care of your preemptible TPUs☆31Updated last year
- ☆17Updated 2 weeks ago
- DiCE: The Infinitely Differentiable Monte-Carlo Estimator☆30Updated last year
- ☆18Updated 6 months ago
- QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning P…☆34Updated last year
- ☆29Updated last year
- ☆53Updated 9 months ago
- LLM training in simple, raw C/CUDA☆12Updated last month
- A MAD laboratory to improve AI architecture designs 🧪☆95Updated 6 months ago
- Make triton easier☆41Updated 5 months ago
- Code implementing "Efficient Parallelization of a Ubiquitious Sequential Computation" (Heinsen, 2023)☆86Updated 10 months ago
- REBUS: A Robust Evaluation Benchmark of Understanding Symbols☆12Updated 3 months ago
- Experiments for efforts to train a new and improved t5☆76Updated 6 months ago
- Minimum Description Length probing for neural network representations☆16Updated last week
- Code for minimum-entropy coupling.☆29Updated 4 months ago
- Pre-train BERT from scratch, with HuggingFace. Accompanies the blog post: sidsite.com/posts/bert-from-scratch☆39Updated last year
- Serialize JAX, Flax, Haiku, or Objax model params with 🤗`safetensors`☆42Updated 5 months ago
- Understanding how features learned by neural networks evolve throughout training☆31Updated 2 weeks ago
- ☆19Updated 6 months ago
- Codes and files for the paper Are Emergent Abilities in Large Language Models just In-Context Learning☆34Updated 7 months ago
- You should use PySR to find scaling laws. Here's an example.☆31Updated last year