This repository contains multiple implementations of Flash Attention optimized with Triton kernels, showcasing progressive performance improvements through hardware-aware optimizations. The implementations range from basic block-wise processing to advanced techniques like FP8 quantization and prefetching
☆11Mar 26, 2026Updated 2 weeks ago
Alternatives and similar repositories for Triton-FlashAttention
Users that are interested in Triton-FlashAttention are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Mixture-of-Basis-Experts for Compressing MoE-based LLMs☆32Dec 24, 2025Updated 3 months ago
- ☆15Aug 3, 2025Updated 8 months ago
- MFAE-YOLO is an object detection method for aerial remote sensing images☆18Jan 27, 2026Updated 2 months ago
- Project for sharing nlp algorithms☆14Feb 19, 2019Updated 7 years ago
- Created as a self-reference for h264 decoding on NVIDIA GPUs using the NVCUVID API.☆15Jan 1, 2016Updated 10 years ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- Serverless Firebase Functions☆12Oct 15, 2019Updated 6 years ago
- ☆15Aug 18, 2022Updated 3 years ago
- ☆41Apr 2, 2026Updated last week
- ☆18Apr 2, 2025Updated last year
- Autoencoder based image compression: can the learning be quantization independent? https://arxiv.org/abs/1802.09371☆19Dec 13, 2022Updated 3 years ago
- ☆19Jan 10, 2023Updated 3 years ago
- A toolbox of commonly used deep learning components, procedures and applications☆18Sep 8, 2023Updated 2 years ago
- [一个聊天软件Demo] a chat software powered by libevent/mysql and qt☆10Sep 10, 2021Updated 4 years ago
- Official implementation of the ICLR paper "Streamlining Redundant Layers to Compress Large Language Models"☆41May 1, 2025Updated 11 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- ☆11Apr 26, 2019Updated 6 years ago
- A High-Quality Diabetic Retinopathy Pixel-Level Annotation Dataset☆15Dec 9, 2025Updated 4 months ago
- serverless vscode webide☆17Apr 14, 2023Updated 2 years ago
- 同济大学简历模版,做了一点点本地化修改 (generated from fky2015/resume-ng)☆15Dec 3, 2023Updated 2 years ago
- [NAACL 2025🔥] MEDA: Dynamic KV Cache Allocation for Efficient Multimodal Long-Context Inference☆19Jun 19, 2025Updated 9 months ago
- ☆14Dec 21, 2025Updated 3 months ago
- analyse problems of AI with Math and Code☆27Jul 28, 2025Updated 8 months ago
- rdma编程学习☆25Dec 6, 2021Updated 4 years ago
- 操作系统第三次课程项目,一个简单的文件系统☆12Jun 24, 2021Updated 4 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- ☆27Dec 24, 2021Updated 4 years ago
- Accommodating Large Language Model Training over Heterogeneous Environment.☆27Mar 13, 2025Updated last year
- My old book about programming for Symbian 9.x based smartphones in russian☆12Jul 8, 2015Updated 10 years ago
- [CVPR 2025] Docopilot: Improving Multimodal Models for Document-Level Understanding☆36Jul 22, 2025Updated 8 months ago
- HIPPO: Enhancing the Table Understanding Capability of Large Language Models through Hybrid-Modal Preference Optimization☆17May 29, 2025Updated 10 months ago
- TodoMVC with Akka-http, Scala.js, Autowire and React☆32Jun 29, 2016Updated 9 years ago
- A testing framework for distributed systems that can inject different types of network-partitioning faults☆16Dec 14, 2021Updated 4 years ago
- NIFTY is a fault tolerance tool to partial network partitions. In case of partial network partitions, NIFTY preserves cluster connectivit…☆21Jan 7, 2021Updated 5 years ago
- 电梯调度,操作系统课程作业☆18Jun 26, 2018Updated 7 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- 糖尿病眼底病变分割和分类☆17Jun 12, 2023Updated 2 years ago
- 操作系统内存管理项目☆14Jun 5, 2021Updated 4 years ago
- A 2D game engine for Elm☆22Jun 21, 2025Updated 9 months ago
- Llama causal LM fully recreated in LibTorch. Designed to be used in Unreal Engine 5☆16Sep 19, 2024Updated last year
- 将操作系统导论(OSTEP)中文版各章节PDF合并到一起,并添加了书签。☆19Apr 10, 2024Updated 2 years ago
- ☆21Aug 25, 2024Updated last year
- ゼロから作るDeep Learning ❸ をC++で実装する。自習用リポジトリ。☆17Aug 12, 2020Updated 5 years ago