thomasahle / arithmetic-transformerLinks

Teaching Addition to Small Transformers

☆17

Alternatives and similar repositories for arithmetic-transformer

Users that are interested in arithmetic-transformer are comparing it to the libraries listed below

Sorting:

geov-ai / geov
The GeoV model is a large langauge model designed by Georges Harik and uses Rotary Positional Embeddings with Relative distances (RoPER).…
☆121Updated 2 years ago
cyrilou242 / ftcc
Fast Text Classification with Compressors dictionary
☆149Updated 2 years ago
glassroom / heinsen_sequence
Code implementing "Efficient Parallelization of a Ubiquitious Sequential Computation" (Heinsen, 2023)
☆95Updated 10 months ago
sytelus / pcprep
Various handy scripts to quickly setup new Linux and Windows sandboxes, containers and WSL.
☆40Updated this week
jxiw / BiGS
Official Repository of Pretraining Without Attention (BiGS), BiGS is the first model to achieve BERT-level transfer learning on the GLUE …
☆114Updated last year
shtoshni / learning-chess-blindfolded
AAAI 2022 Paper: Bet even Beth Harmon couldn't learn chess like that :)
☆38Updated 4 years ago
RobertRiachi / nanoPALM
☆144Updated 2 years ago
BlinkDL / SmallInitEmb
LayerNorm(SmallInit(Embedding)) in a Transformer to improve convergence
☆58Updated 3 years ago
IDSIA / modern-srwm
Official repository for the paper "A Modern Self-Referential Weight Matrix That Learns to Modify Itself" (ICML 2022 & NeurIPS 2021 Deep R…
☆171Updated 4 months ago
google-research / jestimator
Amos optimizer with JEstimator lib.
☆82Updated last year
kmkolasinski / nano-umap
Simplified implementation of UMAP like dimensionality reduction algorithm
☆53Updated 11 months ago
LAION-AI / AIW
Alice in Wonderland code base for experiments and raw experiments data
☆131Updated last month
sradc / pretraining-BERT
Pre-train BERT from scratch, with HuggingFace. Accompanies the blog post: sidsite.com/posts/bert-from-scratch
☆43Updated 5 months ago
LucasPrietoAl / grokking-at-the-edge-of-numerical-stability
☆102Updated 3 months ago
KhoomeiK / complexity-scaling
gzip Predicts Data-dependent Scaling Laws
☆34Updated last year
irhum / hyena
JAX/Flax implementation of the Hyena Hierarchy
☆34Updated 2 years ago
crowsonkb / LDLM
Latent Diffusion Language Models
☆69Updated 2 years ago
google-research-datasets / QAmeleon
QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning P…
☆35Updated 2 years ago
glassroom / heinsen_routing
Reference implementation of "An Algorithm for Routing Vectors in Sequences" (Heinsen, 2022) and "An Algorithm for Routing Capsules in All…
☆172Updated 2 years ago
Aleph-Alpha-Research / trigrams
☆57Updated 3 weeks ago
Cerebras / gigaGPT
a small code base for training large models
☆310Updated 6 months ago
google-research / sloe-logistic
☆29Updated last year
srush / drop7
☆18Updated last year
drisspg / transformer_nuggets
A place to store reusable transformer components of my own creation or found on the interwebs
☆60Updated last week
crowsonkb / dice-mc
DiCE: The Infinitely Differentiable Monte-Carlo Estimator
☆32Updated 2 years ago
dvruette / barrel-rec-pytorch
☆53Updated last year
yk / litter
☆70Updated last year
sdascoli / boolformer
☆163Updated last year
tcapelle / mixtral
Mixtral finetuning
☆19Updated last year
jxbz / agd
Automatic gradient descent
☆215Updated 2 years ago