rlin27 / DeButLinks

Codes of the paper Deformable Butterfly: A Highly Structured and Sparse Linear Transform.

☆13

Alternatives and similar repositories for DeBut

Users that are interested in DeBut are comparing it to the libraries listed below

Sorting:

lucidrains / rela-transformer
Implementation of a Transformer using ReLA (Rectified Linear Attention) from https://arxiv.org/abs/2104.07012
☆49Updated 3 years ago
jungokasai / T2R
☆14Updated 3 years ago
sanagno / adaptively_sparse_attention
☆21Updated 2 years ago
chenjoya / dropit
DropIT: Dropping Intermediate Tensors for Memory-Efficient DNN Training (ICLR 2023)
☆31Updated 2 years ago
RAIVNLab / LLC
☆14Updated 4 years ago
juntang-zhuang / ACProp-Optimizer
Implementation for ACProp ( Momentum centering and asynchronous update for adaptive gradient methdos, NeurIPS 2021)
☆16Updated 4 years ago
renll / SeqBoat
[NeurIPS 2023] Sparse Modular Activation for Efficient Sequence Modeling
☆40Updated 2 years ago
ahennequ / pytorch-custom-mma
☆29Updated 3 years ago
BlinkDL / LinearAttentionArena
Here we will test various linear attention designs.
☆62Updated last year
sustcsonglin / gated_linear_attention_layer
☆32Updated last year
jenni-ai / T2FW
Fine-Tuning Pre-trained Transformers into Decaying Fast Weights
☆19Updated 3 years ago
RobertCsordas / moe_layer
sigma-MoE layer
☆20Updated last year
RobertCsordas / linear_layer_as_attention
The official repository for our paper "The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns …
☆16Updated 5 months ago
glassroom / heinsen_attention
Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)
☆24Updated last year
VITA-Group / Data-Efficient-Scaling
[ICML 2023] "Data Efficient Neural Scaling Law via Model Reusing" by Peihao Wang, Rameswar Panda, Zhangyang Wang
☆14Updated last year
lucidrains / remixer-pytorch
Implementation of the Remixer Block from the Remixer paper, in Pytorch
☆36Updated 4 years ago
smonsays / hypernetwork-attention
Official code for the paper "Attention as a Hypernetwork"
☆46Updated last year
kyegomez / Blockwise-Parallel-Transformer
32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.
☆49Updated 2 years ago
srush / tangent
Source-to-Source Debuggable Derivatives in Pure Python
☆15Updated last year
duterscmy / CD-MoE
Official PyTorch implementation of CD-MOE
☆12Updated 8 months ago
facebookresearch / Ternary_Binary_Transformer
ACL 2023
☆39Updated 2 years ago
ag1988 / top_k_attention
The accompanying code for "Memory-efficient Transformers via Top-k Attention" (Ankit Gupta, Guy Dar, Shaya Goodman, David Ciprut, Jonatha…
☆70Updated 4 years ago
acosharma / elita-transformer
Official Repository for Efficient Linear-Time Attention Transformers.
☆18Updated last year
Doraemonzzz / xmixers
Xmixers: A collection of SOTA efficient token/channel mixers
☆29Updated 3 months ago
Ryu1845 / hyena-jax
Implementation of Hyena Hierarchy in JAX
☆10Updated 2 years ago
epfml / pam
☆16Updated 2 years ago
RAIVNLab / MatFormer-OLMo
Code repository for the public reproduction of the language modelling experiments on "MatFormer: Nested Transformer for Elastic Inference…
☆30Updated 2 years ago
buttercutter / Mamba_SSM
A simple implementation of [Mamba: Linear-Time Sequence Modeling with Selective State Spaces](https://arxiv.org/abs/2312.00752)
☆22Updated last year
ziplab / QLLM
[ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…
☆30Updated last year
OpenNLPLab / HGRN2
HGRN2: Gated Linear RNNs with State Expansion
☆55Updated last year