thomasahle / arithmetic-transformerLinks
Teaching Addition to Small Transformers
☆16Updated last year
Alternatives and similar repositories for arithmetic-transformer
Users that are interested in arithmetic-transformer are comparing it to the libraries listed below
Sorting:
- gzip Predicts Data-dependent Scaling Laws☆35Updated last year
- Code implementing "Efficient Parallelization of a Ubiquitious Sequential Computation" (Heinsen, 2023)☆94Updated 7 months ago
- ☆18Updated last year
- JAX/Flax implementation of the Hyena Hierarchy☆34Updated 2 years ago
- ☆56Updated 2 months ago
- Residual Quantization Autoencoder, used for interpreting LLMs☆12Updated 6 months ago
- Learning Universal Predictors☆77Updated 11 months ago
- ☆98Updated 5 months ago
- A stateful pytree library for training neural networks.☆22Updated 2 years ago
- ☆163Updated last year
- Mixtral finetuning☆19Updated last year
- Entailment self-training☆25Updated 2 years ago
- Fast Text Classification with Compressors dictionary☆150Updated last year
- The GeoV model is a large langauge model designed by Georges Harik and uses Rotary Positional Embeddings with Relative distances (RoPER).…☆121Updated 2 years ago
- A place to store reusable transformer components of my own creation or found on the interwebs☆56Updated this week
- Codes and files for the paper Are Emergent Abilities in Large Language Models just In-Context Learning☆33Updated 6 months ago
- ☆29Updated last year
- Pre-train BERT from scratch, with HuggingFace. Accompanies the blog post: sidsite.com/posts/bert-from-scratch☆41Updated last month
- Various handy scripts to quickly setup new Linux and Windows sandboxes, containers and WSL.☆40Updated 2 months ago
- Simplifying parsing of large jsonline files in NLP Workflows☆12Updated 3 years ago
- Code for minimum-entropy coupling.☆32Updated last year
- Official Repository of Pretraining Without Attention (BiGS), BiGS is the first model to achieve BERT-level transfer learning on the GLUE …☆113Updated last year
- My explorations into editing the knowledge and memories of an attention network☆35Updated 2 years ago
- QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning P…☆34Updated last year
- AAAI 2022 Paper: Bet even Beth Harmon couldn't learn chess like that :)☆38Updated 4 years ago
- ☆53Updated last year
- Ludax is a domain-specific language for board games that automatically compiles into hardware-accelerated learning environments with the …☆19Updated 2 months ago
- An unofficial implementation of the Infini-gram model proposed by Liu et al. (2024)☆33Updated last year
- Amos optimizer with JEstimator lib.☆82Updated last year
- You should use PySR to find scaling laws. Here's an example.☆33Updated last year