thomasahle / arithmetic-transformerLinks
Teaching Addition to Small Transformers
☆17Updated 2 years ago
Alternatives and similar repositories for arithmetic-transformer
Users that are interested in arithmetic-transformer are comparing it to the libraries listed below
Sorting:
- Evolution Pretraining Fully in Int Formats☆136Updated last month
- LayerNorm(SmallInit(Embedding)) in a Transformer to improve convergence☆61Updated 3 years ago
- Fast Text Classification with Compressors dictionary☆150Updated 2 years ago
- Jax like function transformation engine but micro, microjax☆34Updated last year
- Code implementing "Efficient Parallelization of a Ubiquitious Sequential Computation" (Heinsen, 2023)☆98Updated last year
- Amos optimizer with JEstimator lib.☆82Updated last year
- Simplified implementation of UMAP like dimensionality reduction algorithm☆53Updated last year
- gzip Predicts Data-dependent Scaling Laws☆34Updated last year
- RWKV model implementation☆37Updated 2 years ago
- Learning Universal Predictors☆81Updated last year
- Official Repository of Pretraining Without Attention (BiGS), BiGS is the first model to achieve BERT-level transfer learning on the GLUE …☆116Updated last year
- Implementation of GateLoop Transformer in Pytorch and Jax☆92Updated last year
- JAX/Flax implementation of the Hyena Hierarchy☆34Updated 2 years ago
- Repo for solving arc problems with an Neural Cellular Automata☆23Updated 8 months ago
- ☆111Updated 6 months ago
- Simplex Random Feature attention, in PyTorch☆75Updated 2 years ago
- A place to store reusable transformer components of my own creation or found on the interwebs☆72Updated this week
- A virtual dog-sitter that tracks, classifies, and responds to dog audio.☆37Updated 3 years ago
- ☆18Updated last year
- A stateful pytree library for training neural networks.☆22Updated 5 months ago
- Latent Diffusion Language Models☆70Updated 2 years ago
- The GeoV model is a large langauge model designed by Georges Harik and uses Rotary Positional Embeddings with Relative distances (RoPER).…☆121Updated 2 years ago
- A minimal PyTorch re-implementation of GPT (Generative Pretrained Transformer) language model training☆18Updated 2 years ago
- Shaping capabilities with token-level pretraining data filtering☆75Updated last week
- DiCE: The Infinitely Differentiable Monte-Carlo Estimator☆32Updated 2 years ago
- ☆59Updated 2 months ago
- Various handy scripts to quickly setup new Linux and Windows sandboxes, containers and WSL.☆40Updated 2 weeks ago
- AAAI 2022 Paper: Bet even Beth Harmon couldn't learn chess like that :)☆38Updated 4 years ago
- Code for minimum-entropy coupling.☆32Updated last month
- Training code for Sparse Autoencoders on Embedding models☆39Updated 11 months ago