This repository contains multiple implementations of Flash Attention optimized with Triton kernels, showcasing progressive performance improvements through hardware-aware optimizations. The implementations range from basic block-wise processing to advanced techniques like FP8 quantization and prefetching
☆11Mar 26, 2026Updated last month
Alternatives and similar repositories for Triton-FlashAttention
Users that are interested in Triton-FlashAttention are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Mixture-of-Basis-Experts for Compressing MoE-based LLMs☆34Dec 24, 2025Updated 4 months ago
- ☆15Aug 3, 2025Updated 9 months ago
- MFAE-YOLO is an object detection method for aerial remote sensing images☆18Jan 27, 2026Updated 3 months ago
- Created as a self-reference for h264 decoding on NVIDIA GPUs using the NVCUVID API.☆15Jan 1, 2016Updated 10 years ago
- Project for sharing nlp algorithms☆14Feb 19, 2019Updated 7 years ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Real-time decision features without streaming infra. Turn live events into product reflexes — no Kafka, no Flink, no feature store.☆132Updated this week
- Serverless Firebase Functions☆12Oct 15, 2019Updated 6 years ago
- ☆15Aug 18, 2022Updated 3 years ago
- ☆51Apr 14, 2026Updated last month
- ☆18Apr 2, 2025Updated last year
- Autoencoder based image compression: can the learning be quantization independent? https://arxiv.org/abs/1802.09371☆19Dec 13, 2022Updated 3 years ago
- ☆19Jan 10, 2023Updated 3 years ago
- A toolbox of commonly used deep learning components, procedures and applications☆18Sep 8, 2023Updated 2 years ago
- [一个聊天软件Demo] a chat software powered by libevent/mysql and qt☆10Sep 10, 2021Updated 4 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Official implementation of the ICLR paper "Streamlining Redundant Layers to Compress Large Language Models"☆43May 1, 2025Updated last year
- ☆11Apr 26, 2019Updated 7 years ago
- A High-Quality Diabetic Retinopathy Pixel-Level Annotation Dataset☆17Dec 9, 2025Updated 5 months ago
- serverless vscode webide☆17Apr 14, 2023Updated 3 years ago
- ☆14Dec 21, 2025Updated 5 months ago
- analyse problems of AI with Math and Code☆29Jul 28, 2025Updated 9 months ago
- 同济大学简历模版,做了一点点本地化修改 (generated from fky2015/resume-ng)☆17Dec 3, 2023Updated 2 years ago
- [NAACL 2025🔥] MEDA: Dynamic KV Cache Allocation for Efficient Multimodal Long-Context Inference☆20Jun 19, 2025Updated 11 months ago
- rdma编程学习☆24Dec 6, 2021Updated 4 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- 操作系统第三次课程项目,一个简单的文件系统☆12Jun 24, 2021Updated 4 years ago
- ☆27Dec 24, 2021Updated 4 years ago
- Accommodating Large Language Model Training over Heterogeneous Environment.☆30Mar 13, 2025Updated last year
- My old book about programming for Symbian 9.x based smartphones in russian☆14Jul 8, 2015Updated 10 years ago
- [CVPR 2025] Docopilot: Improving Multimodal Models for Document-Level Understanding☆37Jul 22, 2025Updated 9 months ago
- HIPPO: Enhancing the Table Understanding Capability of Large Language Models through Hybrid-Modal Preference Optimization☆17May 29, 2025Updated 11 months ago
- TodoMVC with Akka-http, Scala.js, Autowire and React☆32Jun 29, 2016Updated 9 years ago
- A testing framework for distributed systems that can inject different types of network-partitioning faults☆16Dec 14, 2021Updated 4 years ago
- NIFTY is a fault tolerance tool to partial network partitions. In case of partial network partitions, NIFTY preserves cluster connectivit…☆21Jan 7, 2021Updated 5 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- 电梯调度,操作系统课程作业☆18Jun 26, 2018Updated 7 years ago
- 糖尿病眼底病变分割和分类☆16Jun 12, 2023Updated 2 years ago
- 操作系统内存管理项目☆14Jun 5, 2021Updated 4 years ago
- A 2D game engine for Elm☆22Jun 21, 2025Updated 11 months ago
- Llama causal LM fully recreated in LibTorch. Designed to be used in Unreal Engine 5☆16Sep 19, 2024Updated last year
- 将操作系统导论(OSTEP)中文版各章节PDF合并到一起,并添加了书签。☆22Apr 10, 2024Updated 2 years ago
- ゼロから作るDeep Learning ❸ をC++で実装する。自習用リポジトリ。☆17Aug 12, 2020Updated 5 years ago