DACUS1995 / pytorch-mmap-dataset
A custom pytorch Dataset extension that provides a faster iteration and better RAM usage
☆43Updated last year
Alternatives and similar repositories for pytorch-mmap-dataset
Users that are interested in pytorch-mmap-dataset are comparing it to the libraries listed below
Sorting:
- [EMNLP 2022] Official implementation of Transnormer in our EMNLP 2022 paper - The Devil in Linear Transformer☆60Updated last year
- code for the ddp tutorial☆32Updated 3 years ago
- PyTorch implementation of Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation☆27Updated 3 years ago
- PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)☆72Updated last year
- HGRN2: Gated Linear RNNs with State Expansion☆54Updated 8 months ago
- The accompanying code for "Memory-efficient Transformers via Top-k Attention" (Ankit Gupta, Guy Dar, Shaya Goodman, David Ciprut, Jonatha…☆67Updated 3 years ago
- PyTorch implementation of "From Sparse to Soft Mixtures of Experts"☆56Updated last year
- A repository for DenseSSMs☆87Updated last year
- Skyformer: Remodel Self-Attention with Gaussian Kernel and Nystr\"om Method (NeurIPS 2021)☆60Updated 3 years ago
- 32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.☆47Updated last year
- [EVA ICLR'23; LARA ICML'22] Efficient attention mechanisms via control variates, random features, and importance sampling☆85Updated 2 years ago
- A simple implementation of [Mamba: Linear-Time Sequence Modeling with Selective State Spaces](https://arxiv.org/abs/2312.00752)☆21Updated last year
- Axial Positional Embedding for Pytorch☆79Updated 2 months ago
- Implementation of IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs (ICLR 2024).☆25Updated 11 months ago
- Learnable Fourier Features for Multi-Dimensional Spatial Positional Encoding☆48Updated 7 months ago
- A Tight-fisted Optimizer☆47Updated 2 years ago
- Implementation of TableFormer, Robust Transformer Modeling for Table-Text Encoding, in Pytorch☆38Updated 3 years ago
- differentiable top-k operator☆21Updated 4 months ago
- ☆30Updated 11 months ago
- [ICLR 2023] Official implementation of Transnormer in our ICLR 2023 paper - Toeplitz Neural Network for Sequence Modeling☆79Updated last year
- [NeurIPS 2022] Your Transformer May Not be as Powerful as You Expect (official implementation)☆34Updated last year
- Linear Attention Sequence Parallelism (LASP)☆82Updated 11 months ago
- Lion and Adam optimization comparison☆61Updated 2 years ago
- [NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Se…☆64Updated last year
- [ICLR 2022] Official implementation of cosformer-attention in cosFormer: Rethinking Softmax in Attention☆192Updated 2 years ago
- Implementation of a modular, high-performance, and simplistic mamba for high-speed applications☆34Updated 6 months ago
- several types of attention modules written in PyTorch for learning purposes☆52Updated 7 months ago
- Unofficial PyTorch implementation of the paper "cosFormer: Rethinking Softmax In Attention".☆44Updated 3 years ago
- Experimental scripts for researching data adaptive learning rate scheduling.☆23Updated last year
- Pytorch cyclic cosine decay learning rate scheduler☆48Updated 3 years ago