This repository contains multiple implementations of Flash Attention optimized with Triton kernels, showcasing progressive performance improvements through hardware-aware optimizations. The implementations range from basic block-wise processing to advanced techniques like FP8 quantization and prefetching
☆11Mar 26, 2026Updated 3 months ago
Alternatives and similar repositories for Triton-FlashAttention
Users that are interested in Triton-FlashAttention are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Mixture-of-Basis-Experts for Compressing MoE-based LLMs☆35Dec 24, 2025Updated 6 months ago
- ☆15Aug 3, 2025Updated 10 months ago
- Created as a self-reference for h264 decoding on NVIDIA GPUs using the NVCUVID API.☆15Jan 1, 2016Updated 10 years ago
- Project for sharing nlp algorithms☆14Feb 19, 2019Updated 7 years ago
- Real-time decision features without streaming infra. Turn live events into product reflexes — no Kafka, no Flink, no feature store.☆135May 30, 2026Updated last month
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Serverless Firebase Functions☆12Oct 15, 2019Updated 6 years ago
- ☆16Aug 18, 2022Updated 3 years ago
- ☆57Jun 10, 2026Updated 3 weeks ago
- ☆18Apr 2, 2025Updated last year
- Autoencoder based image compression: can the learning be quantization independent? https://arxiv.org/abs/1802.09371☆19Dec 13, 2022Updated 3 years ago
- ☆19Jan 10, 2023Updated 3 years ago
- A toolbox of commonly used deep learning components, procedures and applications☆18Sep 8, 2023Updated 2 years ago
- [一个聊天软件Demo] a chat software powered by libevent/mysql and qt☆10Sep 10, 2021Updated 4 years ago
- Official implementation of the ICLR paper "Streamlining Redundant Layers to Compress Large Language Models"☆43May 1, 2025Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆11Apr 26, 2019Updated 7 years ago
- A High-Quality Diabetic Retinopathy Pixel-Level Annotation Dataset☆17Dec 9, 2025Updated 6 months ago
- serverless vscode webide☆17Apr 14, 2023Updated 3 years ago
- ☆15Dec 21, 2025Updated 6 months ago
- analyse problems of AI with Math and Code☆31Jul 28, 2025Updated 11 months ago
- 同济大学简历模版,做了一点点本地化修改 (generated from fky2015/resume-ng)☆18Dec 3, 2023Updated 2 years ago
- rdma编程学习☆25Dec 6, 2021Updated 4 years ago
- 操作系统第三次课程项目,一个简单的文件系统☆12Jun 24, 2021Updated 5 years ago
- ☆27Dec 24, 2021Updated 4 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- [NAACL 2025🔥] MEDA: Dynamic KV Cache Allocation for Efficient Multimodal Long-Context Inference☆21Jun 19, 2025Updated last year
- Accommodating Large Language Model Training over Heterogeneous Environment.☆32Mar 13, 2025Updated last year
- My old book about programming for Symbian 9.x based smartphones in russian☆14Jul 8, 2015Updated 10 years ago
- [CVPR 2025] Docopilot: Improving Multimodal Models for Document-Level Understanding☆37Jul 22, 2025Updated 11 months ago
- HIPPO: Enhancing the Table Understanding Capability of Large Language Models through Hybrid-Modal Preference Optimization☆18May 29, 2025Updated last year
- TodoMVC with Akka-http, Scala.js, Autowire and React☆32Jun 29, 2016Updated 10 years ago
- A testing framework for distributed systems that can inject different types of network-partitioning faults☆16Dec 14, 2021Updated 4 years ago
- NIFTY is a fault tolerance tool to partial network partitions. In case of partial network partitions, NIFTY preserves cluster connectivit…☆21Jan 7, 2021Updated 5 years ago
- 电梯调度,操作系统课程作业☆18Jun 26, 2018Updated 8 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- 糖尿病眼底病变分割和分类☆16Jun 12, 2023Updated 3 years ago
- 操作系统内存管理项目☆14Jun 5, 2021Updated 5 years ago
- A 2D game engine for Elm☆26Jun 21, 2026Updated last week
- Llama causal LM fully recreated in LibTorch. Designed to be used in Unreal Engine 5☆16Sep 19, 2024Updated last year
- 将操作系统导论(OSTEP)中文版各章节PDF合并到一起,并添加了书签。☆22Apr 10, 2024Updated 2 years ago
- ゼロから作るDeep Learning ❸ をC++で実装する。自習用リポジトリ。☆17Aug 12, 2020Updated 5 years ago
- ☆22Aug 25, 2024Updated last year