feather-ai / transformers-tutorial
The code for the video tutorial series on building a Transformer from scratch: https://www.youtube.com/watch?v=XR4VDnJzB8o
☆18Updated last year
Related projects ⓘ
Alternatives and complementary repositories for transformers-tutorial
- Little article showing how to load pytorch's models with linear memory consumption☆34Updated 2 years ago
- This repository hosts the code to port NumPy model weights of BiT-ResNets to TensorFlow SavedModel format.☆14Updated 2 years ago
- The Forward-Forward Algorithm for Drug Discovery☆34Updated last year
- Pytorch implementation of a simple way to enable (Stochastic) Frame Averaging for any network☆47Updated 3 months ago
- Implementation of numerous Vision Transformers in Google's JAX and Flax.☆20Updated 2 years ago
- Implementation of Tranception, an attention network, paired with retrieval, that is SOTA for protein fitness prediction☆31Updated 2 years ago
- 🧼🔎 A holistic self-supervised data cleaning strategy to detect irrelevant samples, near duplicates and label errors (NeurIPS'24).☆24Updated 2 weeks ago
- Implements MLP-Mixer (https://arxiv.org/abs/2105.01601) with the CIFAR-10 dataset.☆54Updated 2 years ago
- Sequence models in Numpy☆25Updated 4 years ago
- All my experiments with the various transformers and various transformer frameworks available☆14Updated 3 years ago
- Unofficial PyTorch implementation of the Involution layer from CVPR 2021☆45Updated 3 years ago
- JAX implementation of Learning to learn by gradient descent by gradient descent☆26Updated last month
- ☆73Updated 2 years ago
- Named Entity Recognition with an decoder-only (autoregressive) LLM using HuggingFace☆33Updated last week
- An implementation of Transformer with Expire-Span, a circuit for learning which memories to retain☆33Updated 4 years ago
- HomebrewNLP in JAX flavour for maintable TPU-Training☆46Updated 10 months ago
- Contains my experiments with the `big_vision` repo to train ViTs on ImageNet-1k.☆22Updated last year
- Implementation of "Analysing Mathematical Reasoning Abilities of Neural Models"☆28Updated last year
- Implementation of the Kalman Filtering Attention proposed in "Kalman Filtering Attention for User Behavior Modeling in CTR Prediction"☆57Updated last year
- ☆24Updated 2 years ago
- Code for papers Linear Algebra with Transformers (TMLR) and What is my Math Transformer Doing? (AI for Maths Workshop, Neurips 2022)☆64Updated 3 months ago
- notebooks of cool EBM visualizations☆16Updated 3 years ago
- ML/DL Math and Method notes☆57Updated 11 months ago
- Code of the NVIDIA winning solution to the 2nd OGB-LSC at the NeurIPS 2022 challenge with dataset PCQM4Mv2☆17Updated last year
- Implementation of a Transformer using ReLA (Rectified Linear Attention) from https://arxiv.org/abs/2104.07012☆49Updated 2 years ago
- This is a port of Mistral-7B model in JAX☆30Updated 4 months ago
- A project template where PyTorch Lightning, Pydantic, and more! being used for training MNIST as an example.☆26Updated 2 years ago
- Code repo for ICLR 24 BlogPost titled "Building Diffusion Model's theory from ground up"☆13Updated 11 months ago
- ☆36Updated this week
- This project shows how to derive the total number of training tokens from a large text dataset from 🤗 datasets with Apache Beam and Data…☆24Updated 2 years ago