DACUS1995 / pytorch-mmap-datasetLinks
A custom pytorch Dataset extension that provides a faster iteration and better RAM usage
☆45Updated last year
Alternatives and similar repositories for pytorch-mmap-dataset
Users that are interested in pytorch-mmap-dataset are comparing it to the libraries listed below
Sorting:
- [NeurIPS 2022 Spotlight] This is the official PyTorch implementation of "EcoFormer: Energy-Saving Attention with Linear Complexity"☆74Updated 2 years ago
- Easily benchmark PyTorch model FLOPs, latency, throughput, allocated gpu memory and energy consumption☆107Updated 2 years ago
- Pytorch Implementation of the sparse attention from the paper: "Generating Long Sequences with Sparse Transformers"☆88Updated this week
- The accompanying code for "Memory-efficient Transformers via Top-k Attention" (Ankit Gupta, Guy Dar, Shaya Goodman, David Ciprut, Jonatha…☆70Updated 4 years ago
- A simple implementation of [Mamba: Linear-Time Sequence Modeling with Selective State Spaces](https://arxiv.org/abs/2312.00752)☆22Updated last year
- [ICLR 2022] "As-ViT: Auto-scaling Vision Transformers without Training" by Wuyang Chen, Wei Huang, Xianzhi Du, Xiaodan Song, Zhangyang Wa…☆76Updated 3 years ago
- Code repository of the paper "Modelling Long Range Dependencies in ND: From Task-Specific to a General Purpose CNN" https://arxiv.org/abs…☆183Updated 4 months ago
- several types of attention modules written in PyTorch for learning purposes☆52Updated last year
- Implementation of fused cosine similarity attention in the same style as Flash Attention☆216Updated 2 years ago
- Recent Advances in MLP-based Models (MLP is all you need!)☆116Updated 2 years ago
- 32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.☆49Updated 2 years ago
- Implementation of "Attention Is Off By One" by Evan Miller☆197Updated 2 years ago
- Unofficial PyTorch implementation of Google's FNet: Mixing Tokens with Fourier Transforms. With checkpoints.☆77Updated 3 years ago
- Transformers w/o Attention, based fully on MLPs☆95Updated last year
- A repository for DenseSSMs☆88Updated last year
- [ICLR 2022] Official implementation of cosformer-attention in cosFormer: Rethinking Softmax in Attention☆195Updated 2 years ago
- A collection of differentiable SVD methods and ICCV21 "Why Approximate Matrix Square Root Outperforms Accurate SVD in Global Covariance P…☆78Updated last year
- PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)☆78Updated 2 years ago
- ☆183Updated last year
- A simple minimal implementation of Reversible Vision Transformers☆125Updated last year
- ☆22Updated 2 years ago
- Context Manager to profile the forward and backward times of PyTorch's nn.Module☆82Updated last year
- The official implementation of the paper "Reducing Fine-Tuning Memory Overhead by Approximate and Memory-Sharing Backpropagation"☆20Updated 9 months ago
- Pytorch cyclic cosine decay learning rate scheduler☆49Updated 4 years ago
- PyTorch implementation of Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation☆29Updated 3 years ago
- Implementation of IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs (ICLR 2024).☆25Updated 2 months ago
- ☆33Updated 3 months ago
- Torch Distributed Experimental☆117Updated last year
- Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"☆383Updated 2 years ago
- Figures I made during my PhD in Deep Learning, for my models and for context☆83Updated 4 years ago