Stonesjtu / pytorch-learningLinks
learning notes when learning the source code of pytorch
☆24Updated 6 years ago
Alternatives and similar repositories for pytorch-learning
Users that are interested in pytorch-learning are comparing it to the libraries listed below
Sorting:
- Distributed ML Optimizer☆32Updated 3 years ago
- Research and development for optimizing transformers☆126Updated 4 years ago
- Training neural networks in TensorFlow 2.0 with 5x less memory☆131Updated 3 years ago
- ☆108Updated last year
- ☆105Updated 9 months ago
- Fast Discounted Cumulative Sums in PyTorch☆96Updated 3 years ago
- Pytorch library for factorized L0-based pruning.☆45Updated last year
- ☆38Updated last year
- Experiment of using Tangent to autodiff triton☆79Updated last year
- Official Pytorch Implementation of Length-Adaptive Transformer (ACL 2021)☆101Updated 4 years ago
- Customized matrix multiplication kernels☆54Updated 3 years ago
- Implementation of a Transformer, but completely in Triton☆266Updated 3 years ago
- Torch Distributed Experimental☆117Updated 10 months ago
- Python pdb for multiple processes☆44Updated last week
- Efficient, check-pointed data loading for deep learning with massive data sets.☆208Updated last year
- Simple and efficient pytorch-native transformer training and inference (batched)☆75Updated last year
- Example python package with pybind11 cpp extension☆57Updated 4 years ago
- Block-sparse primitives for PyTorch☆155Updated 4 years ago
- Profile the GPU memory usage of every line in a Pytorch code☆82Updated 6 years ago
- A fusion of a linear layer and a cross entropy loss, written for pytorch in triton.☆67Updated 10 months ago
- See details in https://github.com/pytorch/xla/blob/r1.12/torch_xla/distributed/fsdp/README.md☆24Updated 2 years ago
- Block Sparse movement pruning☆79Updated 4 years ago
- [JMLR'20] NeurIPS 2019 MicroNet Challenge Efficient Language Modeling, Champion☆40Updated 4 years ago
- Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Training☆210Updated 9 months ago
- A minimal PyTorch Lightning OpenAI GPT w DeepSpeed Training!☆111Updated 2 years ago
- Best practices for testing advanced Mixtral, DeepSeek, and Qwen series MoE models using Megatron Core MoE.☆17Updated this week
- ☆250Updated 10 months ago
- Transformer with Mu-Parameterization, implemented in Jax/Flax. Supports FSDP on TPU pods.☆30Updated last week
- ☆47Updated 4 years ago
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆44Updated 10 months ago