This repository contains multiple implementations of Flash Attention optimized with Triton kernels, showcasing progressive performance improvements through hardware-aware optimizations. The implementations range from basic block-wise processing to advanced techniques like FP8 quantization and prefetching
☆11Mar 26, 2026Updated 2 months ago
Alternatives and similar repositories for Triton-FlashAttention
Users that are interested in Triton-FlashAttention are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Mixture-of-Basis-Experts for Compressing MoE-based LLMs☆34Dec 24, 2025Updated 5 months ago
- ☆15Aug 3, 2025Updated 10 months ago
- Created as a self-reference for h264 decoding on NVIDIA GPUs using the NVCUVID API.☆15Jan 1, 2016Updated 10 years ago
- Project for sharing nlp algorithms☆14Feb 19, 2019Updated 7 years ago
- Real-time decision features without streaming infra. Turn live events into product reflexes — no Kafka, no Flink, no feature store.☆133May 30, 2026Updated last week
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Serverless Firebase Functions☆12Oct 15, 2019Updated 6 years ago
- ☆15Aug 18, 2022Updated 3 years ago
- ☆53Apr 14, 2026Updated last month
- ☆18Apr 2, 2025Updated last year
- Autoencoder based image compression: can the learning be quantization independent? https://arxiv.org/abs/1802.09371☆19Dec 13, 2022Updated 3 years ago
- ☆19Jan 10, 2023Updated 3 years ago
- A toolbox of commonly used deep learning components, procedures and applications☆18Sep 8, 2023Updated 2 years ago
- [一个聊天软件Demo] a chat software powered by libevent/mysql and qt☆10Sep 10, 2021Updated 4 years ago
- Official implementation of the ICLR paper "Streamlining Redundant Layers to Compress Large Language Models"☆43May 1, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆11Apr 26, 2019Updated 7 years ago
- A High-Quality Diabetic Retinopathy Pixel-Level Annotation Dataset☆17Dec 9, 2025Updated 6 months ago
- serverless vscode webide☆17Apr 14, 2023Updated 3 years ago
- ☆14Dec 21, 2025Updated 5 months ago
- analyse problems of AI with Math and Code☆30Jul 28, 2025Updated 10 months ago
- 同济大学简历模版,做了一点点本地化修改 (generated from fky2015/resume-ng)☆17Dec 3, 2023Updated 2 years ago
- [NAACL 2025🔥] MEDA: Dynamic KV Cache Allocation for Efficient Multimodal Long-Context Inference☆20Jun 19, 2025Updated 11 months ago
- rdma编程学习☆24Dec 6, 2021Updated 4 years ago
- 操作系统第三次课程项目,一个简单的文件系统☆12Jun 24, 2021Updated 4 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆27Dec 24, 2021Updated 4 years ago
- Accommodating Large Language Model Training over Heterogeneous Environment.☆31Mar 13, 2025Updated last year
- My old book about programming for Symbian 9.x based smartphones in russian☆14Jul 8, 2015Updated 10 years ago
- [CVPR 2025] Docopilot: Improving Multimodal Models for Document-Level Understanding☆37Jul 22, 2025Updated 10 months ago
- HIPPO: Enhancing the Table Understanding Capability of Large Language Models through Hybrid-Modal Preference Optimization☆18May 29, 2025Updated last year
- TodoMVC with Akka-http, Scala.js, Autowire and React☆32Jun 29, 2016Updated 9 years ago
- A testing framework for distributed systems that can inject different types of network-partitioning faults☆16Dec 14, 2021Updated 4 years ago
- NIFTY is a fault tolerance tool to partial network partitions. In case of partial network partitions, NIFTY preserves cluster connectivit…☆21Jan 7, 2021Updated 5 years ago
- 电梯调度,操作系统课程作业☆18Jun 26, 2018Updated 7 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- 糖尿病眼底病变分割和分类☆16Jun 12, 2023Updated 2 years ago
- 操作系统内存管理项目☆14Jun 5, 2021Updated 5 years ago
- A 2D game engine for Elm☆23Jun 21, 2025Updated 11 months ago
- Llama causal LM fully recreated in LibTorch. Designed to be used in Unreal Engine 5☆16Sep 19, 2024Updated last year
- 将操作系统导论(OSTEP)中文版各章节PDF合并到一起,并添加了书签。☆22Apr 10, 2024Updated 2 years ago
- ゼロから作るDeep Learning ❸ をC++で実装する。自習用リポジトリ。☆17Aug 12, 2020Updated 5 years ago
- ☆22Aug 25, 2024Updated last year