srush / Transformer-Puzzles
Puzzles for exploring transformers
☆344Updated 2 years ago
Alternatives and similar repositories for Transformer-Puzzles:
Users that are interested in Transformer-Puzzles are comparing it to the libraries listed below
- ☆430Updated 6 months ago
- What would you do with 1000 H100s...☆1,043Updated last year
- A puzzle to learn about prompting☆127Updated last year
- Annotated version of the Mamba paper☆483Updated last year
- ☆217Updated 9 months ago
- Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jax☆569Updated this week
- An interactive exploration of Transformer programming.☆262Updated last year
- For optimization algorithm research and development.☆509Updated this week
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆536Updated last week
- Solve puzzles. Learn CUDA.☆64Updated last year
- Building blocks for foundation models.☆487Updated last year
- Understand and test language model architectures on synthetic tasks.☆194Updated last month
- seqax = sequence modeling + JAX☆155Updated 3 weeks ago
- ☆446Updated 9 months ago
- MLCommons Algorithmic Efficiency is a benchmark and competition measuring neural network training speedups due to algorithmic improvement…☆379Updated 2 weeks ago
- Puzzles for learning Triton☆1,603Updated 5 months ago
- Deep learning for dummies. All the practical details and useful utilities that go into working with real models.☆788Updated this week
- ☆166Updated last year
- Open weights language model from Google DeepMind, based on Griffin.☆636Updated 2 months ago
- JAX implementation of the Llama 2 model☆218Updated last year
- Language Modeling with the H3 State Space Model☆520Updated last year
- ☆301Updated 10 months ago
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆130Updated last year
- Everything you want to know about Google Cloud TPU☆527Updated 9 months ago
- Minimalistic, extremely fast, and hackable researcher's toolbench for GPT models in 307 lines of code. Reaches <3.8 validation loss on wi…☆345Updated 9 months ago
- Implementation of https://srush.github.io/annotated-s4☆494Updated 2 years ago
- Universal Tensor Operations in Einstein-Inspired Notation for Python.☆367Updated 3 weeks ago
- Resources for skilling up in AI alignment research engineering. Covers basics of deep learning, mechanistic interpretability, and RL.☆211Updated last year
- A repository for log-time feedforward networks☆222Updated last year
- An implementation of the transformer architecture onto an Nvidia CUDA kernel☆180Updated last year