PAIR-code / tiny-transformers
☆16Updated this week
Related projects ⓘ
Alternatives and complementary repositories for tiny-transformers
- Codes accompanying the paper "LaProp: a Better Way to Combine Momentum with Adaptive Gradient"☆21Updated 4 years ago
- ☆18Updated 7 months ago
- DiCE: The Infinitely Differentiable Monte-Carlo Estimator☆30Updated last year
- Simplifying parsing of large jsonline files in NLP Workflows☆12Updated 2 years ago
- A framework for implementing equivariant DL☆10Updated 3 years ago
- Recursive Leasting Squares (RLS) with Neural Network for fast learning☆52Updated last year
- Implementation of Metaformer, but in an autoregressive manner☆23Updated 2 years ago
- A collection of optimizers, some arcane others well known, for Flax.☆29Updated 3 years ago
- An attempt to merge ESBN with Transformers, to endow Transformers with the ability to emergently bind symbols☆14Updated 3 years ago
- High-performance tokenized language data-loader for Python C++ extension☆12Updated 4 months ago
- ☆16Updated 2 years ago
- Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing☆47Updated 2 years ago
- Usable implementation of Emerging Symbol Binding Network (ESBN), in Pytorch☆23Updated 3 years ago
- Minimum Description Length probing for neural network representations☆16Updated last week
- Implementation of a holodeck, written in Pytorch☆17Updated last year
- A GPT, made only of MLPs, in Jax☆55Updated 3 years ago
- Re-implementation of 'Grokking: Generalization beyond overfitting on small algorithmic datasets'☆38Updated 2 years ago
- RWKV model implementation☆38Updated last year
- Implementations of growing and pruning in neural networks☆21Updated last year
- Usable implementation of Mogrifier, a circuit for enhancing LSTMs and potentially other networks, from Deepmind☆16Updated 5 months ago
- AdaCat☆49Updated 2 years ago
- Local Attention - Flax module for Jax☆20Updated 3 years ago