Stonesjtu / pytorch-learningLinks
learning notes when learning the source code of pytorch
☆24Updated 6 years ago
Alternatives and similar repositories for pytorch-learning
Users that are interested in pytorch-learning are comparing it to the libraries listed below
Sorting:
- Efficient, check-pointed data loading for deep learning with massive data sets.☆209Updated 2 years ago
- Implementation of a Transformer, but completely in Triton☆275Updated 3 years ago
- ☆122Updated last year
- Distributed ML Optimizer☆33Updated 4 years ago
- Torch Distributed Experimental☆117Updated last year
- Python pdb for multiple processes☆59Updated 4 months ago
- ☆189Updated 2 weeks ago
- Fast Discounted Cumulative Sums in PyTorch☆96Updated 4 years ago
- Profile the GPU memory usage of every line in a Pytorch code☆83Updated 7 years ago
- ☆253Updated last year
- Training neural networks in TensorFlow 2.0 with 5x less memory☆136Updated 3 years ago
- some common Huggingface transformers in maximal update parametrization (µP)☆82Updated 3 years ago
- Research and development for optimizing transformers☆131Updated 4 years ago
- A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind…☆161Updated 3 weeks ago
- Simple implementation of Speculative Sampling in NumPy for GPT-2.☆96Updated 2 years ago
- ☆148Updated 2 years ago
- A minimal PyTorch Lightning OpenAI GPT w DeepSpeed Training!☆113Updated 2 years ago
- Simple and efficient pytorch-native transformer training and inference (batched)☆79Updated last year
- Official code for "Distributed Deep Learning in Open Collaborations" (NeurIPS 2021)☆117Updated 3 years ago
- A case study of efficient training of large language models using commodity hardware.☆68Updated 3 years ago
- PyTorch centric eager mode debugger☆48Updated 9 months ago
- JAX bindings for Flash Attention v2☆95Updated last month
- PyTorch RFCs (experimental)☆135Updated 4 months ago
- Pytorch library for factorized L0-based pruning.☆45Updated 2 years ago
- ☆89Updated last year
- ☆62Updated 3 years ago
- CUDA and Triton implementations of Flash Attention with SoftmaxN.☆73Updated last year
- Train very large language models in Jax.☆209Updated last year
- A fusion of a linear layer and a cross entropy loss, written for pytorch in triton.☆70Updated last year
- Make triton easier☆47Updated last year