dhcode-cpp / online-softmaxLinks

simplest online-softmax notebook for explain Flash Attention

☆13

Alternatives and similar repositories for online-softmax

Users that are interested in online-softmax are comparing it to the libraries listed below

Sorting:

shreyansh26 / FlashAttention-PyTorch
Implementation of FlashAttention in PyTorch
☆174Updated 10 months ago
dhcode-cpp / easy-dualpipe
Pipeline-Parallel Lecture: Simplest Dualpipe Implementation.
☆27Updated 2 months ago
mdy666 / Qwen-Native-Sparse-Attention
qwen-nsa
☆84Updated last month
Ascend / AscendSpeed
☆79Updated last year
TRT2022 / trtllm-llama
☢️ TensorRT 2023复赛——基于TensorRT-LLM的Llama模型推断加速优化
☆50Updated 2 years ago
zms1999 / SmartMoE
A MoE impl for PyTorch, [ATC'23] SmartMoE
☆70Updated 2 years ago
sunkx109 / llama
Inference code for LLaMA models
☆127Updated 2 years ago
cauyxy / bilivideos
☆51Updated 2 years ago
preacher-1 / MLA_tutorial
from MHA, MQA, GQA to MLA by 苏剑林, with code
☆33Updated 9 months ago
Oneflow-Inc / models
Models and examples built with OneFlow
☆100Updated last year
DD-DuDa / BitDistiller
[ACL 2024] A novel QAT with Self-Distillation framework to enhance ultra low-bit LLMs.
☆129Updated last year
qingkelab / qingketalk
青稞Talk
☆168Updated this week
mdy666 / mdy_triton
☆149Updated 4 months ago
firechecking / CleanTransformer
an implementation of transformer, bert, gpt, and diffusion models for learning purposes
☆159Updated last year
dhcode-cpp / NSA-pytorch
DeepSeek Native Sparse Attention pytorch implementation
☆108Updated 3 weeks ago
Tencent / AngelSlim
Model compression toolkit engineered for enhanced usability, comprehensiveness, and efficiency.
☆207Updated last week
liangyuwang / Tiny-DeepSpeed
Tiny-DeepSpeed, a minimalistic re-implementation of the DeepSpeed library
☆48Updated 3 months ago
hyperai / triton-cn
Triton Documentation in Chinese Simplified / Triton 中文文档
☆91Updated last week
Rayrtfr / FasterTransformer
Transformer related optimization, including BERT, GPT
☆17Updated 2 years ago
bobo0810 / LearnDeepSpeed
DeepSpeed教程 & 示例注释 & 学习笔记（大模型高效训练）
☆183Updated 2 years ago
smart-lty / ParallelSpeculativeDecoding
[ICLR 2025] PEARL: Parallel Speculative Decoding with Adaptive Draft Length
☆131Updated last month
madsys-dev / deepseekv2-profile
☆152Updated 8 months ago
tingshua-yts / BetterDL
☆37Updated 2 years ago
AniZpZ / AutoSmoothQuant
An easy-to-use package for implementing SmoothQuant for LLMs
☆109Updated 7 months ago
ironartisan / awesome-compression1
模型压缩的小白入门教程
☆22Updated last year
genggui001 / Megatron-DeepSpeed-Llama
☆84Updated 2 years ago
pprp / Awesome-Efficient-MoE
Efficient Mixture of Experts for LLM Paper List
☆144Updated 2 months ago
void-main / FasterTransformer
Transformer related optimization, including BERT, GPT
☆59Updated 2 years ago
DD-DuDa / TensorRT-in-Action
TensorRT-in-Action 是一个 GitHub 代码库，提供了使用 TensorRT 的代码示例，并有对应 Jupyter Notebook。
☆15Updated 2 years ago
jiahe7ay / infini-mini-transformer
This is a personal reimplementation of Google's Infini-transformer, utilizing a small 2b model. The project includes both model and train…
☆58Updated last year