lxxue / prefix_sumLinks

A PyTorch wrapper of parallel exclusive scan in CUDA

☆12

Alternatives and similar repositories for prefix_sum

Users that are interested in prefix_sum are comparing it to the libraries listed below

Sorting:

johnryan465 / pscan
☆40Updated last year
maximzubkov / fft-scan
Efficient PScan implementation in PyTorch
☆16Updated last year
proger / nanokitchen
Parallel Associative Scan for Language Models
☆18Updated last year
proger / accelerated-scan
Accelerated First Order Parallel Associative Scan
☆182Updated 10 months ago
glassroom / heinsen_sequence
Code implementing "Efficient Parallelization of a Ubiquitious Sequential Computation" (Heinsen, 2023)
☆94Updated 7 months ago
thjashin / multires-conv
Sequence Modeling with Multiresolution Convolutional Memory (ICML 2023)
☆124Updated last year
shikaiqiu / compute-better-spent
☆53Updated 9 months ago
sustcsonglin / gated_linear_attention_layer
☆32Updated last year
AndPotap / einsum-search
☆32Updated 9 months ago
eamartin / parallelizing_linear_rnns
☆43Updated 7 years ago
sustcsonglin / flash-linear-rnn
Implementations of various linear RNN layers using pytorch and triton
☆53Updated last year
machine-discovery / deer
Parallelizing non-linear sequential models over the sequence length
☆52Updated 3 weeks ago
sustcsonglin / mamba-triton
☆48Updated last year
lucidrains / gateloop-transformer
Implementation of GateLoop Transformer in Pytorch and Jax
☆89Updated last year
srush / mamba-scans
Blog post
☆17Updated last year
glassroom / heinsen_attention
Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)
☆24Updated last year
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆79Updated last year
tml-epfl / why-weight-decay
Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]
☆66Updated 9 months ago
ruke1ire / RTF
A State-Space Model with Rational Transfer Function Representation.
☆79Updated last year
NicolasZucchet / minimal-LRU
Non official implementation of the Linear Recurrent Unit (LRU, Orvieto et al. 2023)
☆53Updated 8 months ago
radarFudan / mamba-minimal-jax
☆31Updated 7 months ago
vvvm23 / mamba-jax
Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAX
☆84Updated last year
OpenNLPLab / HGRN
[NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Se…
☆66Updated last year
graphcore-research / unit-scaling
A library for unit scaling in PyTorch
☆125Updated this week
lixilinx / psgd_torch
Pytorch implementation of preconditioned stochastic gradient descent (Kron and affine preconditioner, low-rank approximation precondition…
☆179Updated last month
automl / unlocking_state_tracking
Expanding linear RNN state-transition matrix eigenvalues to include negatives improves state-tracking tasks and language modeling without…
☆15Updated 4 months ago
ag1988 / dss
Sequence Modeling with Structured State Spaces
☆65Updated 2 years ago
ag1988 / dlr
The accompanying code for "Simplifying and Understanding State Space Models with Diagonal Linear RNNs" (Ankit Gupta, Harsh Mehta, Jonatha…
☆22Updated 2 years ago
LIONS-EPFL / scion
☆26Updated 2 weeks ago
google-deepmind / spectral_ssm
☆32Updated last year