lxxue / prefix_sum
A PyTorch wrapper of parallel exclusive scan in CUDA
☆9Updated last year
Alternatives and similar repositories for prefix_sum:
Users that are interested in prefix_sum are comparing it to the libraries listed below
- Parallel Associative Scan for Language Models☆18Updated last year
- Blog post☆16Updated 11 months ago
- Sequence Modeling with Multiresolution Convolutional Memory (ICML 2023)☆121Updated last year
- Accelerated First Order Parallel Associative Scan☆169Updated 4 months ago
- ☆37Updated last year
- Experiment of using Tangent to autodiff triton☆74Updated 11 months ago
- ☆32Updated last year
- ☆50Updated 3 months ago
- Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]☆58Updated 3 months ago
- Implementation of GateLoop Transformer in Pytorch and Jax☆87Updated 6 months ago
- Implementations of various linear RNN layers using pytorch and triton☆49Updated last year
- Efficient PScan implementation in PyTorch☆15Updated last year
- Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)☆24Updated 7 months ago
- CUDA implementation of autoregressive linear attention, with all the latest research findings☆44Updated last year
- ☆31Updated last month
- HGRN2: Gated Linear RNNs with State Expansion☆52Updated 4 months ago
- Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAX☆82Updated 11 months ago
- ☆46Updated 11 months ago
- Triton Implementation of HyperAttention Algorithm☆46Updated last year
- ☆31Updated 9 months ago
- Parallelizing non-linear sequential models over the sequence length☆49Updated last week
- Pytorch implementation of preconditioned stochastic gradient descent (Kron and affine preconditioner, low-rank approximation precondition…☆146Updated last month
- ☆24Updated 3 months ago
- An implementation of PSGD Kron second-order optimizer for PyTorch☆21Updated 2 weeks ago
- The accompanying code for "Simplifying and Understanding State Space Models with Diagonal Linear RNNs" (Ankit Gupta, Harsh Mehta, Jonatha…☆19Updated 2 years ago
- Implementation of PSGD optimizer in JAX☆26Updated 2 weeks ago
- ☆51Updated 7 months ago
- A State-Space Model with Rational Transfer Function Representation.☆76Updated 8 months ago
- ☆74Updated last year
- FlashRNN - Fast RNN Kernels with I/O Awareness☆69Updated last month