guolinke / fused_opsLinks

☆10

Alternatives and similar repositories for fused_ops

Users that are interested in fused_ops are comparing it to the libraries listed below

Sorting:

instance-wise-ordered-transformer / IOT
☆20Updated 4 years ago
LAION-AI / Conditional-Pretraining-of-Large-Language-Models
☆37Updated 2 years ago
sIncerass / powernorm
[ICML 2020] code for "PowerNorm: Rethinking Batch Normalization in Transformers" https://arxiv.org/abs/2003.07845
☆120Updated 4 years ago
mlpc-ucsd / BERT_Convolutions
(ACL-IJCNLP 2021) Convolutions and Self-Attention: Re-interpreting Relative Positions in Pre-trained Language Models.
☆21Updated 3 years ago
HKUNLP / efficient-attention
[EVA ICLR'23; LARA ICML'22] Efficient attention mechanisms via control variates, random features, and importance sampling
☆87Updated 2 years ago
layer6ai-labs / T-Fixup
Code for the ICML'20 paper "Improving Transformer Optimization Through Better Initialization"
☆89Updated 4 years ago
CyndxAI / QKNorm
Code for the paper "Query-Key Normalization for Transformers"
☆49Updated 4 years ago
microsoft / DualLearning
A dual learning toolkit developed by Microsoft Research
☆73Updated 2 years ago
xtinkt / editable
A supplementary code for Editable Neural Networks, an ICLR 2020 submission.
☆46Updated 5 years ago
MAC-AutoML / YOCO-BERT
The official implementation of You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Natu…
☆48Updated 4 years ago
microsoft / Stochastic-Mixture-of-Experts
This package implements THOR: Transformer with Stochastic Experts.
☆65Updated 4 years ago
L1aoXingyu / llm-infer-bench
☆12Updated 2 years ago
cheneydon / efficient-bert
This repository contains the code for the paper in Findings of EMNLP 2021: "EfficientBERT: Progressively Searching Multilayer Perceptron …
☆33Updated 2 years ago
lucidrains / triangle-multiplicative-module
Implementation of the Triangle Multiplicative module, used in Alphafold2 as an efficient way to mix rows or columns of a 2d feature map, …
☆39Updated 4 years ago
shreyansh26 / Attention-Mask-Patterns
Using FlexAttention to compute attention with different masking patterns
☆47Updated last year
SciMT / SciMT-benchmark
☆11Updated last year
zomux / lanmt-ebm
lanmt ebm
☆12Updated 5 years ago
CHARM-Tx / linear_mem_attention_pytorch
Unofficially Implements https://arxiv.org/abs/2112.05682 to get Linear Memory Cost on Attention for PyTorch
☆12Updated 3 years ago
NVIDIA / dllogger
A logging tool for deep learning.
☆63Updated 8 months ago
pytorch / torchdistx
Torch Distributed Experimental
☆117Updated last year
XinbangZhang / DATA-NAS
Codes for DATA: Differentiable ArchiTecture Approximation.
☆11Updated 4 years ago
lucidrains / lie-transformer-pytorch
Implementation of Lie Transformer, Equivariant Self-Attention, in Pytorch
☆96Updated 4 years ago
LiyuanLucasLiu / Torch-Scope
A Toolkit for Training, Tracking, Saving Models and Syncing Results
☆62Updated 5 years ago
eric-haibin-lin / verl-data
☆12Updated 6 months ago
bojone / univae
基于Transformer的单模型、多尺度的VAE模型
☆57Updated 4 years ago
Alab-NII / Awesome-SciLM
Pre-trained Language Model for Scientific Text
☆46Updated last year
kyegomez / Blockwise-Parallel-Transformer
32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.
☆49Updated 2 years ago
microsoft / SparseMixer
Sparse Backpropagation for Mixture-of-Expert Training
☆29Updated last year
dguo98 / DiffPruning
Parameter Efficient Transfer Learning with Diff Pruning
☆74Updated 4 years ago
BBuf / flash-rwkv
☆32Updated last year