Fast Block Sparse Matrices for Pytorch
☆549Jan 21, 2021Updated 5 years ago
Alternatives and similar repositories for pytorch_block_sparse
Users that are interested in pytorch_block_sparse are comparing it to the libraries listed below
Sorting:
- Block-sparse primitives for PyTorch☆158Apr 5, 2021Updated 4 years ago
- Efficient GPU kernels for block-sparse matrix multiplication and convolution☆1,064Jun 8, 2023Updated 2 years ago
- CUDA templates for tile-sparse matrix multiplication based on CUTLASS.☆50Mar 1, 2018Updated 8 years ago
- ☆221Jun 8, 2020Updated 5 years ago
- Pytorch library for fast transformer implementations☆1,763Mar 23, 2023Updated 2 years ago
- Transformer training code for sequential tasks☆609Sep 14, 2021Updated 4 years ago
- PyTorch extensions for high performance and large scale training.☆3,403Apr 26, 2025Updated 10 months ago
- Fast, general, and tested differentiable structured prediction in PyTorch☆1,124Apr 20, 2022Updated 3 years ago
- SentAugment is a data augmentation technique for NLP that retrieves similar sentences from a large bank of sentences. It can be used in c…☆359Feb 22, 2022Updated 4 years ago
- FastFormers - highly efficient transformer models for NLU☆709Mar 21, 2025Updated 11 months ago
- Longformer: The Long-Document Transformer☆2,189Feb 8, 2023Updated 3 years ago
- A library of GPU kernels for sparse matrix operations.☆283Nov 24, 2020Updated 5 years ago
- Long Range Arena for Benchmarking Efficient Transformers☆786Dec 16, 2023Updated 2 years ago
- higher is a pytorch library allowing users to obtain higher order gradients over losses spanning training loops rather than individual tr…☆1,628Mar 25, 2022Updated 3 years ago
- ☆21Mar 15, 2023Updated 3 years ago
- PyTorch Extension Library of Optimized Autograd Sparse Matrix Operations☆1,093Mar 9, 2026Updated last week
- Examples of using sparse attention, as in "Generating Long Sequences with Sparse Transformers"☆1,611Aug 12, 2020Updated 5 years ago
- An efficient implementation of the popular sequence models for text generation, summarization, and translation tasks. https://arxiv.org/p…☆433Aug 17, 2022Updated 3 years ago
- A python library for highly configurable transformers - easing model architecture search and experimentation.☆48Nov 30, 2021Updated 4 years ago
- A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch☆8,936Updated this week
- End-to-end training of sparse deep neural networks with little-to-no performance loss.☆335Jan 26, 2023Updated 3 years ago
- Flexible and powerful tensor operations for readable and reliable code (for pytorch, jax, TF and others)☆9,430Feb 20, 2026Updated last month
- Single Headed Attention RNN - "Stop thinking with your head"☆1,181Nov 27, 2021Updated 4 years ago
- Type annotations and dynamic checking for a tensor's shape, dtype, names, etc.☆1,475May 2, 2025Updated 10 months ago
- Shape and dimension inference (Keras-like) for PyTorch layers and neural networks☆570Jun 13, 2022Updated 3 years ago
- Papers & presentation materials from Hugging Face's internal science day☆2,054Oct 31, 2020Updated 5 years ago
- Repository for the paper "Optimal Subarchitecture Extraction for BERT"☆470Jun 22, 2022Updated 3 years ago
- DeLighT: Very Deep and Light-Weight Transformers☆469Oct 16, 2020Updated 5 years ago
- ☆774Jan 27, 2024Updated 2 years ago
- Prune a model while finetuning or training.☆406Jun 21, 2022Updated 3 years ago
- On the Variance of the Adaptive Learning Rate and Beyond☆2,549Jul 31, 2021Updated 4 years ago
- Efficient, check-pointed data loading for deep learning with massive data sets.☆211Jun 12, 2023Updated 2 years ago
- 🚪✊Knock Knock: Get notified when your training ends with only two additional lines of code☆2,825Jun 23, 2023Updated 2 years ago
- [ICLR 2020] Lite Transformer with Long-Short Range Attention☆610Jul 11, 2024Updated last year
- KErnel OPerationS, on CPUs and GPUs, with autodiff and without memory overflows☆1,162Feb 6, 2026Updated last month
- Repository for NeurIPS 2020 Spotlight "AdaBelief Optimizer: Adapting stepsizes by the belief in observed gradients"☆1,068Aug 9, 2024Updated last year
- Cascaded Text Generation with Markov Transformers☆130Mar 20, 2023Updated 3 years ago
- Parallelformers: An Efficient Model Parallelization Toolkit for Deployment☆791Apr 24, 2023Updated 2 years ago
- PyTorch implementation of L2L execution algorithm☆108Jan 16, 2023Updated 3 years ago