Stonesjtu / pytorch-learning
learning notes when learning the source code of pytorch
☆24Updated 5 years ago
Related projects ⓘ
Alternatives and complementary repositories for pytorch-learning
- Distributed ML Optimizer☆30Updated 3 years ago
- Fast Discounted Cumulative Sums in PyTorch☆95Updated 3 years ago
- A case study of efficient training of large language models using commodity hardware.☆68Updated 2 years ago
- Official Pytorch Implementation of Length-Adaptive Transformer (ACL 2021)☆100Updated 4 years ago
- ☆77Updated 5 months ago
- Implementation of a Transformer, but completely in Triton☆248Updated 2 years ago
- This repository contains example code to build models on TPUs☆30Updated last year
- some common Huggingface transformers in maximal update parametrization (µP)☆76Updated 2 years ago
- [JMLR'20] NeurIPS 2019 MicroNet Challenge Efficient Language Modeling, Champion☆40Updated 3 years ago
- Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing☆47Updated 2 years ago
- A library to create and manage configuration files, especially for machine learning projects.☆77Updated 2 years ago
- Profile the GPU memory usage of every line in a Pytorch code☆82Updated 6 years ago
- Torch Distributed Experimental☆116Updated 3 months ago
- ☆88Updated 2 months ago
- Official code for "Distributed Deep Learning in Open Collaborations" (NeurIPS 2021)☆116Updated 2 years ago
- A fusion of a linear layer and a cross entropy loss, written for pytorch in triton.☆54Updated 3 months ago
- ☆237Updated 3 months ago
- A minimal PyTorch Lightning OpenAI GPT w DeepSpeed Training!☆110Updated last year
- ☆132Updated last year
- Another attempt at a long-context / efficient transformer by me☆37Updated 2 years ago
- ☆71Updated 6 months ago
- Pytorch library for factorized L0-based pruning.☆43Updated last year
- ☆57Updated 2 years ago
- Research and development for optimizing transformers☆125Updated 3 years ago
- Transformer with Mu-Parameterization, implemented in Jax/Flax. Supports FSDP on TPU pods.☆29Updated 2 weeks ago
- Implementation of a Transformer that Ponders, using the scheme from the PonderNet paper☆79Updated 3 years ago
- Simple and efficient pytorch-native transformer training and inference (batched)☆61Updated 7 months ago
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆38Updated 10 months ago