Flash Attention in ~100 lines of CUDA (forward pass only)
☆11Jun 10, 2024Updated last year
Alternatives and similar repositories for flash-attention-minimal
Users that are interested in flash-attention-minimal are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- 🎉My Collections of CUDA Kernels~☆11Jun 25, 2024Updated last year
- Implement Flash Attention using Cute.☆103Dec 17, 2024Updated last year
- llama 2 Inference☆43Nov 4, 2023Updated 2 years ago
- Awesome code, projects, books, etc. related to CUDA☆32Feb 3, 2026Updated last month
- PointPillars TensorRT version pretrained on MMDetection3d with WaymoOpenDataset☆22Aug 11, 2022Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- TensorRT-in-Action 是一个 GitHub 代码库,提供了使用 TensorRT 的代码示例,并有对应 Jupyter Notebook。☆15Jun 1, 2023Updated 2 years ago
- A fork of the BEVDet series .☆22Oct 8, 2023Updated 2 years ago
- Multiple GEMM operators are constructed with cutlass to support LLM inference.☆20Aug 3, 2025Updated 7 months ago
- paper-read-notes☆13Sep 26, 2024Updated last year
- ☆27Aug 5, 2022Updated 3 years ago
- bilibili视频【CUDA 12.x 并行编程入门(C++版)】配套代码☆34Aug 12, 2024Updated last year
- This project is the Torch implementation of our ICCV 2017 paper: Centered Weight Normalization in Accelerating Training of Deep Neural…☆21Dec 7, 2019Updated 6 years ago
- learn TensorRT from scratch🥰☆18Sep 29, 2024Updated last year
- 搜藏的希望的代码片段☆13Jun 6, 2023Updated 2 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- HunyuanDiT with TensorRT and libtorch☆18May 22, 2024Updated last year
- Standalone Flash Attention v2 kernel without libtorch dependency☆113Sep 10, 2024Updated last year
- 使用 cutlass 实现 flash-attention 精简版,具有教学意义☆59Aug 12, 2024Updated last year
- 使用mnn-llm对GOT-OCR2.0进行推理☆14Oct 2, 2024Updated last year
- A plugin to make view transformer from perspective view to bird-eye-view, it is used in bevdet☆24Feb 24, 2023Updated 3 years ago
- ☆15Dec 30, 2024Updated last year
- 高性能 高精度 大陆车牌、港澳车牌、台湾车牌 韩国车牌(South Korea LPR)识别 代码开源(ncnn移植)☆41Nov 5, 2025Updated 4 months ago
- segmentation algorithm yolact use tensorrt deploy☆14May 7, 2022Updated 3 years ago
- DLBlas: clean and efficient kernels☆35Mar 16, 2026Updated 2 weeks ago
- NordVPN Threat Protection Pro™ • AdTake your cybersecurity to the next level. Block phishing, malware, trackers, and ads. Lightweight app that works with all browsers.
- 3d object detection model smoke c++ inference code☆39Dec 1, 2022Updated 3 years ago
- Implementation of a histogram equalization program using CUDA. Histogram equalization is a technique for adjusting image intensities to e…☆13Jan 3, 2021Updated 5 years ago
- llama INT4 cuda inference with AWQ☆54Jan 20, 2025Updated last year
- Inference Llama 2 in one file of pure Cuda☆17Aug 20, 2023Updated 2 years ago
- Quantize yolov7 using pytorch_quantization.🚀🚀🚀☆12Oct 20, 2023Updated 2 years ago
- ☆18May 10, 2023Updated 2 years ago
- Homework of CMU 10-414/714: Deep Learning Systems (https://dlsyscourse.org/)☆15Mar 21, 2024Updated 2 years ago
- CUDA SGEMM optimization note☆15Oct 31, 2023Updated 2 years ago
- 一个轻量化的大模型推理框架☆22May 26, 2025Updated 10 months ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- This is a repository to practice multi-thread programming in C++☆28Feb 21, 2024Updated 2 years ago
- Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.☆16Aug 31, 2023Updated 2 years ago
- Google's MediaPipe (v0.8.9) and Python Wheel installer for Jetson Nano (JetPack 4.6) compiled for CUDA 10.2☆16Jun 7, 2023Updated 2 years ago
- StaRD: Statute Retrieval Dataset based on Real-World Legal Consultation☆20Apr 24, 2025Updated 11 months ago
- A one-page-only CGraph-API-liked DAG project.☆26Feb 11, 2025Updated last year
- ☆20Dec 29, 2023Updated 2 years ago
- CUTLASS and CuTe Examples☆134Nov 30, 2025Updated 3 months ago