thomasahle / arithmetic-transformerLinks
Teaching Addition to Small Transformers
☆17Updated last year
Alternatives and similar repositories for arithmetic-transformer
Users that are interested in arithmetic-transformer are comparing it to the libraries listed below
Sorting:
- The GeoV model is a large langauge model designed by Georges Harik and uses Rotary Positional Embeddings with Relative distances (RoPER).…☆121Updated 2 years ago
- Fast Text Classification with Compressors dictionary☆149Updated 2 years ago
- Code implementing "Efficient Parallelization of a Ubiquitious Sequential Computation" (Heinsen, 2023)☆95Updated 10 months ago
- Various handy scripts to quickly setup new Linux and Windows sandboxes, containers and WSL.☆40Updated this week
- Official Repository of Pretraining Without Attention (BiGS), BiGS is the first model to achieve BERT-level transfer learning on the GLUE …☆114Updated last year
- AAAI 2022 Paper: Bet even Beth Harmon couldn't learn chess like that :)☆38Updated 4 years ago
- ☆144Updated 2 years ago
- LayerNorm(SmallInit(Embedding)) in a Transformer to improve convergence☆58Updated 3 years ago
- Official repository for the paper "A Modern Self-Referential Weight Matrix That Learns to Modify Itself" (ICML 2022 & NeurIPS 2021 Deep R…☆171Updated 4 months ago
- Amos optimizer with JEstimator lib.☆82Updated last year
- Simplified implementation of UMAP like dimensionality reduction algorithm☆53Updated 11 months ago
- Alice in Wonderland code base for experiments and raw experiments data☆131Updated last month
- Pre-train BERT from scratch, with HuggingFace. Accompanies the blog post: sidsite.com/posts/bert-from-scratch☆43Updated 5 months ago
- ☆102Updated 3 months ago
- gzip Predicts Data-dependent Scaling Laws☆34Updated last year
- JAX/Flax implementation of the Hyena Hierarchy☆34Updated 2 years ago
- Latent Diffusion Language Models☆69Updated 2 years ago
- QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning P…☆35Updated 2 years ago
- Reference implementation of "An Algorithm for Routing Vectors in Sequences" (Heinsen, 2022) and "An Algorithm for Routing Capsules in All…☆172Updated 2 years ago
- ☆57Updated 3 weeks ago
- a small code base for training large models☆310Updated 6 months ago
- ☆29Updated last year
- ☆18Updated last year
- A place to store reusable transformer components of my own creation or found on the interwebs☆60Updated last week
- DiCE: The Infinitely Differentiable Monte-Carlo Estimator☆32Updated 2 years ago
- ☆53Updated last year
- ☆70Updated last year
- ☆163Updated last year
- Mixtral finetuning☆19Updated last year
- Automatic gradient descent☆215Updated 2 years ago