pytorch / maskedtensor
MaskedTensors for PyTorch
☆38Updated 2 years ago
Alternatives and similar repositories for maskedtensor:
Users that are interested in maskedtensor are comparing it to the libraries listed below
- CUDA implementation of autoregressive linear attention, with all the latest research findings☆44Updated last year
- An implementation of PSGD Kron second-order optimizer for PyTorch☆29Updated 3 weeks ago
- ☆29Updated 2 years ago
- [TMLR 2022] Curvature access through the generalized Gauss-Newton's low-rank structure: Eigenvalues, eigenvectors, directional derivative…☆17Updated last year
- Implementation of some personal helper functions for Einops, my most favorite tensor manipulation library ❤️☆53Updated 2 years ago
- ☆100Updated 7 months ago
- ☆50Updated 3 months ago
- Experiment of using Tangent to autodiff triton☆74Updated last year
- ☆47Updated 2 years ago
- [ICML 2024] SIRFShampoo: Structured inverse- and root-free Shampoo in PyTorch (https://arxiv.org/abs/2402.03496)☆14Updated 2 months ago
- ☆33Updated last year
- Implementation of Gated State Spaces, from the paper "Long Range Language Modeling via Gated State Spaces", in Pytorch☆97Updated last year
- Official code for "Accelerating Feedforward Computation via Parallel Nonlinear Equation Solving", ICML 2021☆27Updated 3 years ago
- Blog post☆16Updated 11 months ago
- Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]☆59Updated 4 months ago
- ☆30Updated this week
- Fast Discounted Cumulative Sums in PyTorch☆95Updated 3 years ago
- AdaCat☆49Updated 2 years ago
- Implementation of LogAvgExp for Pytorch☆32Updated 2 years ago
- Code for the paper PermuteFormer☆42Updated 3 years ago
- [ICML 2024] SINGD: KFAC-like Structured Inverse-Free Natural Gradient Descent (http://arxiv.org/abs/2312.05705)☆21Updated 2 months ago
- Implementation of deep implicit attention in PyTorch☆64Updated 3 years ago
- Pytorch implementation of preconditioned stochastic gradient descent (Kron and affine preconditioner, low-rank approximation precondition…☆154Updated last month
- gpu tester detects broken and slow gpus in a cluster☆67Updated last year
- Tensor Parallelism with JAX + Shard Map☆11Updated last year
- A selection of neural network models ported from torchvision for JAX & Flax.☆44Updated 4 years ago
- Parallel Associative Scan for Language Models☆18Updated last year
- Fast training of unitary deep network layers from low-rank updates☆28Updated 2 years ago
- Another attempt at a long-context / efficient transformer by me☆37Updated 2 years ago
- Implementation of Hourglass Transformer, in Pytorch, from Google and OpenAI☆84Updated 3 years ago