DACUS1995 / pytorch-mmap-dataset
A custom pytorch Dataset extension that provides a faster iteration and better RAM usage
☆42Updated 11 months ago
Alternatives and similar repositories for pytorch-mmap-dataset:
Users that are interested in pytorch-mmap-dataset are comparing it to the libraries listed below
- several types of attention modules written in PyTorch for learning purposes☆46Updated 5 months ago
- Easily benchmark PyTorch model FLOPs, latency, throughput, allocated gpu memory and energy consumption☆98Updated last year
- ☆30Updated 9 months ago
- PyTorch implementation of "From Sparse to Soft Mixtures of Experts"☆51Updated last year
- A repository for DenseSSMs☆87Updated 10 months ago
- Implementation of IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs (ICLR 2024).☆22Updated 8 months ago
- CUDA implementation of autoregressive linear attention, with all the latest research findings☆44Updated last year
- PyTorch implementation of Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation☆28Updated 3 years ago
- State Space Models☆64Updated 10 months ago
- Implementation of TableFormer, Robust Transformer Modeling for Table-Text Encoding, in Pytorch☆37Updated 2 years ago
- ☆8Updated last year
- [EMNLP 2022] Official implementation of Transnormer in our EMNLP 2022 paper - The Devil in Linear Transformer☆59Updated last year
- DropIT: Dropping Intermediate Tensors for Memory-Efficient DNN Training (ICLR 2023)☆30Updated last year
- PyTorch, PyTorch Lightning framework for trying knowledge distillation in image classification problems☆32Updated 7 months ago
- Linear Attention Sequence Parallelism (LASP)☆79Updated 9 months ago
- PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)☆71Updated last year
- [EVA ICLR'23; LARA ICML'22] Efficient attention mechanisms via control variates, random features, and importance sampling☆80Updated last year
- possibly useful materials for learning RWKV language model.☆24Updated last year
- Implementation of a Transformer using ReLA (Rectified Linear Attention) from https://arxiv.org/abs/2104.07012☆49Updated 2 years ago
- State-of-the-art data augmentation search algorithms in PyTorch☆47Updated last year
- ☆100Updated 11 months ago
- Transformers w/o Attention, based fully on MLPs☆93Updated 10 months ago
- A practical implementation of GradNorm, Gradient Normalization for Adaptive Loss Balancing, in Pytorch☆86Updated last year
- A Tight-fisted Optimizer☆47Updated last year
- Implementation of a Light Recurrent Unit in Pytorch☆47Updated 4 months ago
- Testing various improvements to Ranger21 for 2022☆18Updated 3 months ago
- 32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.☆46Updated last year
- Lion and Adam optimization comparison☆59Updated 2 years ago
- Implementation of Multistream Transformers in Pytorch☆53Updated 3 years ago
- [Preprint] Why is the State of Neural Network Pruning so Confusing? On the Fairness, Comparison Setup, and Trainability in Network Prunin…☆40Updated 2 years ago