lyj20071013/Triton-FlashAttention

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/lyj20071013/Triton-FlashAttention)

lyj20071013 / Triton-FlashAttention

This repository contains multiple implementations of Flash Attention optimized with Triton kernels, showcasing progressive performance improvements through hardware-aware optimizations. The implementations range from basic block-wise processing to advanced techniques like FP8 quantization and prefetching

☆11

Alternatives and similar repositories for Triton-FlashAttention

Users that are interested in Triton-FlashAttention are comparing it to the libraries listed below

Sorting:

flashserve / PAT
View on GitHub
Prefix-Aware Attention for LLM Decoding
☆29Jan 23, 2026Updated last month
xiangruihu / bilibili
View on GitHub
☆15Aug 3, 2025Updated 7 months ago
yiboCode / MFAE-YOLO
View on GitHub
MFAE-YOLO is an object detection method for aerial remote sensing images
☆15Jan 27, 2026Updated last month
adulau / netbeacon
View on GitHub
netbeacon - monitoring your network capture, NIDS or network analysis process
☆19Oct 26, 2013Updated 12 years ago
zhangjiong724 / autoassist-exp
View on GitHub
Code accompanying the NeurIPS 2019 paper AutoAssist: A Framework to Accelerate Training of Deep Neural Networks.
☆14Oct 3, 2022Updated 3 years ago
lzhangbv / acpsgd
View on GitHub
[ICDCS 2023] Evaluation and Optimization of Gradient Compression for Distributed Deep Learning
☆10Apr 28, 2023Updated 2 years ago
wangshicheng1225 / LoRDMA
View on GitHub
☆11Oct 21, 2023Updated 2 years ago
fpgasystems / fpga-hyperloglog
View on GitHub
FPGA-based HyperLogLog Accelerator
☆12Jul 13, 2020Updated 5 years ago
LZKPKU / PKUCVXOPT
View on GitHub
Peking University Convex Optimization Course given by Professor Wen Zaiwen
☆11Jan 11, 2018Updated 8 years ago
zhongxinghong / Java-Jieba
View on GitHub
Jieba 0.39 的 Java 复刻版，支持原版 Jieba 的所有核心功能
☆12Feb 14, 2019Updated 7 years ago
icsnju / VeriXmith
View on GitHub
A tool for cross-checking Verilog compilers
☆14Apr 16, 2025Updated 10 months ago
shijy16 / ACETest
View on GitHub
For our ISSTA'23 paper ACETest: Automated Constraint Extraction for Testing Deep Learning Operators
☆13Mar 30, 2024Updated last year
gzz2000 / RoSSH
View on GitHub
🛠Robust SSH: auto-reconnect SSH session that preserves your running shell and command. Intuitive, no server-side setup, aimed at simplic…
☆13Nov 14, 2025Updated 3 months ago
CompML / survey-deep-gradient-compression
View on GitHub
☆10Jun 4, 2021Updated 4 years ago
pku-minic / next-gen-ir-proposal
View on GitHub
Proposal for the next generation of course-oriented IR.
☆10Dec 24, 2021Updated 4 years ago
feifeibear / PSTensor
View on GitHub
PSTensor provides a way to hack the memory management of tensors in TensorFlow and PyTorch by defining your own C++ Tensor Class.
☆10Feb 10, 2022Updated 4 years ago
NetExperimentEasy / DDCC
View on GitHub
☆20Jul 29, 2024Updated last year
mit-plv / hemiola
View on GitHub
A Coq framework to support structural design and proof of hardware cache-coherence protocols
☆14May 7, 2022Updated 3 years ago
CSU-NetLab / A2TP-Eurosys2023
View on GitHub
☆11Mar 13, 2023Updated 2 years ago
jiashu-z / how-to-plot
View on GitHub
How to plot for papers, slides, demos, etc.
☆10Apr 7, 2022Updated 3 years ago
PatrickGuo / Mistify
View on GitHub
☆10May 16, 2021Updated 4 years ago
astra-sim / astra-network-ns3
View on GitHub
☆10Jun 28, 2025Updated 8 months ago
AFKD98 / FLOAT
View on GitHub
☆12May 18, 2024Updated last year
tgangwani / RegAlloc
View on GitHub
Chaitin-Briggs register-allocation algorithm (LLVM back-end)
☆12Jan 6, 2016Updated 10 years ago
magic3007 / MiniJava-Compiler
View on GitHub
🕹 Implementation for the lesson Compiling Engineering(2020 Spring) in Peking University, adjusted from UCLA CS 132 Project.
☆10Jun 21, 2020Updated 5 years ago
P4xos / P4xos
View on GitHub
☆11Sep 22, 2017Updated 8 years ago
josehu07 / summerset
View on GitHub
Distributed, Replicated, Protocol-generic Key-value Store in Async Rust For SMR Protocols Research
☆17Updated this week
datenlord / etcd-client
View on GitHub
☆15Jul 18, 2023Updated 2 years ago
TJ-CSCCG / tongji-cv
View on GitHub
同济大学简历模版，做了一点点本地化修改 (generated from fky2015/resume-ng)
☆14Dec 3, 2023Updated 2 years ago
StingySketch / Stingy-Sketch
View on GitHub
☆13Jan 21, 2022Updated 4 years ago
zxytim / arithmetic-encoding-compression
View on GitHub
☆11Apr 3, 2023Updated 2 years ago
esposem / Kernel_Paxos
View on GitHub
Kernel Module that implements Paxos protocol
☆12Oct 23, 2020Updated 5 years ago
AIoT-MLSys-Lab / MEDA
View on GitHub
[NAACL 2025🔥] MEDA: Dynamic KV Cache Allocation for Efficient Multimodal Long-Context Inference
☆18Jun 19, 2025Updated 8 months ago
kazukiosawa / pipe-fisher
View on GitHub
☆10Apr 29, 2023Updated 2 years ago
I-Doctor / gnn-acceleration-framework-with-FPGA
View on GitHub
including compiler to encode DGL GNN model to instructions, runtime software to transfer data and control the accelerator, and hardware v…
☆14Nov 19, 2023Updated 2 years ago
unikraft / lib-lwip
View on GitHub
Unikraft port of the lwip network stack
☆15Feb 26, 2026Updated last week
HKBU-HPML / OMGS-SGD
View on GitHub
Layer-wise Sparsification of Distributed Deep Learning
☆10Jul 6, 2020Updated 5 years ago
NekoPii / TJDR
View on GitHub
A High-Quality Diabetic Retinopathy Pixel-Level Annotation Dataset
☆15Dec 9, 2025Updated 3 months ago
ISS-Kerui / BIRA-NET-BILINEAR-ATTENTION-NET-FOR-DIABETIC-RETINOPATHY-GRADING
View on GitHub
☆11Apr 26, 2019Updated 6 years ago