Aleph-Alpha-Research / NeurIPS-WANT-submission-efficient-parallelization-layoutsLinks

☆22

Alternatives and similar repositories for NeurIPS-WANT-submission-efficient-parallelization-layouts

Users that are interested in NeurIPS-WANT-submission-efficient-parallelization-layouts are comparing it to the libraries listed below

Sorting:

Doraemonzzz / Awesome-Triton-Resources
Awesome Triton Resources
☆36Updated 6 months ago
dame-cell / Triformer
Transformers components but in Triton
☆34Updated 5 months ago
sustcsonglin / mamba-triton
☆48Updated last year
feifeibear / Odysseus-Transformer
Odysseus: Playground of LLM Sequence Parallelism
☆78Updated last year
BBuf / flash-rwkv
☆32Updated last year
Doraemonzzz / xmixers
Xmixers: A collection of SOTA efficient token/channel mixers
☆29Updated last month
Edward-Sun / gpt-accelera
Simple and efficient pytorch-native transformer training and inference (batched)
☆78Updated last year
00ffcc / chunkRWKV6
continous batching and parallel acceleration for RWKV6
☆22Updated last year
OpenNLPLab / LASP
Linear Attention Sequence Parallelism (LASP)
☆87Updated last year
linxihui / dkernel
☆20Updated 6 months ago
habanero-lab / APPy
APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to…
☆25Updated last week
teelinsan / parallel-decoding
Repository of the paper "Accelerating Transformer Inference for Translation via Parallel Decoding"
☆120Updated last year
sail-sg / SimLayerKV
The official implementation of paper: SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction.
☆50Updated last year
MayDomine / Seq1F1B
Sequence-level 1F1B schedule for LLMs.
☆18Updated last year
thunlp / Ouroboros
Ouroboros: Speculative Decoding with Large Model Enhanced Drafting (EMNLP 2024 main)
☆111Updated 7 months ago
epfml / dynamic-sparse-flash-attention
☆149Updated 2 years ago
stanford-futuredata / stk
☆112Updated last year
HazyResearch / prefix-linear-attention
☆56Updated last year
Dao-AILab / grouped-latent-attention
☆130Updated 5 months ago
BlinkDL / LinearAttentionArena
Here we will test various linear attention designs.
☆61Updated last year
Raincleared-Song / sparse_gpu_operator
GPU operators for sparse tensor operations
☆35Updated last year
kyegomez / Blockwise-Parallel-Transformer
32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.
☆48Updated 2 years ago
exists-forall / striped_attention
☆41Updated last year
tanyuqian / redco
NAACL '24 (Best Demo Paper RunnerUp) / MlSys @ NeurIPS '23 - RedCoast: A Lightweight Tool to Automate Distributed Training and Inference
☆68Updated 10 months ago
jungokasai / T2R
☆14Updated 2 years ago
Infini-AI-Lab / gsm_infinite
☆55Updated 4 months ago
sail-sg / VocabularyParallelism
Vocabulary Parallelism
☆23Updated 7 months ago
FasterDecoding / TEAL
☆145Updated 8 months ago
SqueezeAILab / SqueezedAttention
[ACL 2025] Squeezed Attention: Accelerating Long Prompt LLM Inference
☆54Updated 11 months ago
IST-DASLab / Sparse-Marlin
Boosting 4-bit inference kernels with 2:4 Sparsity
☆84Updated last year