This repository contains multiple implementations of Flash Attention optimized with Triton kernels, showcasing progressive performance improvements through hardware-aware optimizations. The implementations range from basic block-wise processing to advanced techniques like FP8 quantization and prefetching
☆11Mar 26, 2026Updated last month
Alternatives and similar repositories for Triton-FlashAttention
Users that are interested in Triton-FlashAttention are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Mixture-of-Basis-Experts for Compressing MoE-based LLMs☆33Dec 24, 2025Updated 4 months ago
- ☆15Aug 3, 2025Updated 8 months ago
- MFAE-YOLO is an object detection method for aerial remote sensing images☆18Jan 27, 2026Updated 3 months ago
- Project for sharing nlp algorithms☆14Feb 19, 2019Updated 7 years ago
- Created as a self-reference for h264 decoding on NVIDIA GPUs using the NVCUVID API.☆15Jan 1, 2016Updated 10 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Serverless Firebase Functions☆12Oct 15, 2019Updated 6 years ago
- ☆15Aug 18, 2022Updated 3 years ago
- ☆49Apr 14, 2026Updated 2 weeks ago
- ☆18Apr 2, 2025Updated last year
- Autoencoder based image compression: can the learning be quantization independent? https://arxiv.org/abs/1802.09371☆19Dec 13, 2022Updated 3 years ago
- ☆19Jan 10, 2023Updated 3 years ago
- A toolbox of commonly used deep learning components, procedures and applications☆18Sep 8, 2023Updated 2 years ago
- [一个聊天软件Demo] a chat software powered by libevent/mysql and qt☆10Sep 10, 2021Updated 4 years ago
- Official implementation of the ICLR paper "Streamlining Redundant Layers to Compress Large Language Models"☆42May 1, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆11Apr 26, 2019Updated 7 years ago
- A High-Quality Diabetic Retinopathy Pixel-Level Annotation Dataset☆17Dec 9, 2025Updated 4 months ago
- serverless vscode webide☆17Apr 14, 2023Updated 3 years ago
- 同济大学简历模版,做了一点点本地化修改 (generated from fky2015/resume-ng)☆15Dec 3, 2023Updated 2 years ago
- ☆14Dec 21, 2025Updated 4 months ago
- analyse problems of AI with Math and Code☆27Jul 28, 2025Updated 9 months ago
- [NAACL 2025🔥] MEDA: Dynamic KV Cache Allocation for Efficient Multimodal Long-Context Inference☆20Jun 19, 2025Updated 10 months ago
- rdma编程学习☆24Dec 6, 2021Updated 4 years ago
- 操作系统第三次课程项目,一个简单的文件系统☆12Jun 24, 2021Updated 4 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ☆27Dec 24, 2021Updated 4 years ago
- Accommodating Large Language Model Training over Heterogeneous Environment.☆28Mar 13, 2025Updated last year
- My old book about programming for Symbian 9.x based smartphones in russian☆13Jul 8, 2015Updated 10 years ago
- [CVPR 2025] Docopilot: Improving Multimodal Models for Document-Level Understanding☆37Jul 22, 2025Updated 9 months ago
- HIPPO: Enhancing the Table Understanding Capability of Large Language Models through Hybrid-Modal Preference Optimization☆17May 29, 2025Updated 11 months ago
- TodoMVC with Akka-http, Scala.js, Autowire and React☆32Jun 29, 2016Updated 9 years ago
- A testing framework for distributed systems that can inject different types of network-partitioning faults☆16Dec 14, 2021Updated 4 years ago
- NIFTY is a fault tolerance tool to partial network partitions. In case of partial network partitions, NIFTY preserves cluster connectivit…☆21Jan 7, 2021Updated 5 years ago
- 电梯调度,操作系统课程作业☆18Jun 26, 2018Updated 7 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- 糖尿病眼底病变分割和分类☆17Jun 12, 2023Updated 2 years ago
- 操作系统内存管理项目☆14Jun 5, 2021Updated 4 years ago
- A 2D game engine for Elm☆22Jun 21, 2025Updated 10 months ago
- Llama causal LM fully recreated in LibTorch. Designed to be used in Unreal Engine 5☆16Sep 19, 2024Updated last year
- 将操作系统导论(OSTEP)中文版各章节PDF合并到一起,并添加了书签。☆22Apr 10, 2024Updated 2 years ago
- ☆21Aug 25, 2024Updated last year
- ゼロから作るDeep Learning ❸ をC++で実装する。自習用リポジトリ。☆17Aug 12, 2020Updated 5 years ago