feather-ai / transformers-tutorial
The code for the video tutorial series on building a Transformer from scratch: https://www.youtube.com/watch?v=XR4VDnJzB8o
☆18Updated last year
Related projects ⓘ
Alternatives and complementary repositories for transformers-tutorial
- This repository hosts the code to port NumPy model weights of BiT-ResNets to TensorFlow SavedModel format.☆14Updated 2 years ago
- Little article showing how to load pytorch's models with linear memory consumption☆34Updated 2 years ago
- Contains my experiments with the `big_vision` repo to train ViTs on ImageNet-1k.☆22Updated last year
- Implementation of numerous Vision Transformers in Google's JAX and Flax.☆20Updated 2 years ago
- The Forward-Forward Algorithm for Drug Discovery☆34Updated last year
- Implements MLP-Mixer (https://arxiv.org/abs/2105.01601) with the CIFAR-10 dataset.☆54Updated 2 years ago
- Adversarial examples to the new ConvNeXt architecture☆20Updated 2 years ago
- NLP Examples using the 🤗 libraries☆42Updated 3 years ago
- Basic guidance on how to contribute to Papers with Code☆20Updated 2 years ago
- Cyclemoid implementation for PyTorch☆87Updated 2 years ago
- notebooks of cool EBM visualizations☆16Updated 3 years ago
- Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing☆47Updated 2 years ago
- ☆24Updated 2 years ago
- JAX implementation of Learning to learn by gradient descent by gradient descent☆25Updated 3 weeks ago
- Official repository for our ICLR 2021 paper Evaluating the Disentanglement of Deep Generative Models with Manifold Topology☆36Updated 3 years ago
- Shows how to do parameter ensembling using differential evolution.☆10Updated 2 years ago
- ☆17Updated 3 weeks ago
- Implementation of the Kalman Filtering Attention proposed in "Kalman Filtering Attention for User Behavior Modeling in CTR Prediction"☆57Updated last year
- Yet another mini autodiff system for educational purposes☆27Updated 11 months ago
- Pytorch implementation of a simple way to enable (Stochastic) Frame Averaging for any network☆46Updated 3 months ago
- This is a port of Mistral-7B model in JAX☆30Updated 4 months ago
- DiCE: The Infinitely Differentiable Monte-Carlo Estimator☆30Updated last year
- Code of the NVIDIA winning solution to the 2nd OGB-LSC at the NeurIPS 2022 challenge with dataset PCQM4Mv2☆17Updated last year
- Experiments on GPT-3's ability to fit numerical models in-context.☆14Updated 2 years ago
- Code repo for ICLR 24 BlogPost titled "Building Diffusion Model's theory from ground up"☆13Updated 11 months ago
- HomebrewNLP in JAX flavour for maintable TPU-Training☆46Updated 9 months ago
- Implementation of a Transformer using ReLA (Rectified Linear Attention) from https://arxiv.org/abs/2104.07012☆49Updated 2 years ago
- A framework for implementing equivariant DL☆10Updated 3 years ago
- Implementation of Tranception, an attention network, paired with retrieval, that is SOTA for protein fitness prediction☆31Updated 2 years ago
- FID computation in Jax/Flax.☆24Updated 3 months ago