luliyucoordinate / flash-attention-minimalView external linksLinks
Flash Attention in ~100 lines of CUDA (forward pass only)
☆11Jun 10, 2024Updated last year
Alternatives and similar repositories for flash-attention-minimal
Users that are interested in flash-attention-minimal are comparing it to the libraries listed below
Sorting:
- Awesome code, projects, books, etc. related to CUDA☆30Feb 3, 2026Updated 2 weeks ago
- 🎉My Collections of CUDA Kernels~☆11Jun 25, 2024Updated last year
- Inference deployment of the llama3☆11Apr 21, 2024Updated last year
- llama 2 Inference☆43Nov 4, 2023Updated 2 years ago
- paper-read-notes☆13Sep 26, 2024Updated last year
- Implementation of a histogram equalization program using CUDA. Histogram equalization is a technique for adjusting image intensities to e…☆13Jan 3, 2021Updated 5 years ago
- Implement Flash Attention using Cute.☆100Dec 17, 2024Updated last year
- TensorRT-in-Action 是一个 GitHub 代码库,提供了使用 TensorRT 的代码示例,并有对应 Jupyter Notebook。☆15Jun 1, 2023Updated 2 years ago
- 搜藏的希望的代码片段☆13Jun 6, 2023Updated 2 years ago
- 使用mnn-llm对GOT-OCR2.0进行推理☆14Oct 2, 2024Updated last year
- learn TensorRT from scratch🥰☆18Sep 29, 2024Updated last year
- segmentation algorithm yolact use tensorrt deploy☆14May 7, 2022Updated 3 years ago
- Multiple GEMM operators are constructed with cutlass to support LLM inference.☆20Aug 3, 2025Updated 6 months ago
- 一个轻量化的大模型推理框架☆21May 26, 2025Updated 8 months ago
- 使用 cutlass 实现 flash-attention 精简版,具有教学意义☆56Aug 12, 2024Updated last year
- HunyuanDiT with TensorRT and libtorch☆18May 22, 2024Updated last year
- A fork of the BEVDet series .☆21Oct 8, 2023Updated 2 years ago
- Inference Llama 2 in one file of pure Cuda☆17Aug 20, 2023Updated 2 years ago
- 高性能 高精度 大陆车牌、港澳车牌、台湾车牌 韩国车牌(South Korea LPR)识别 代码开源(ncnn移植)☆41Nov 5, 2025Updated 3 months ago
- PointPillars TensorRT version pretrained on MMDetection3d with WaymoOpenDataset☆22Aug 11, 2022Updated 3 years ago
- ☆20Dec 29, 2023Updated 2 years ago
- llama INT4 cuda inference with AWQ☆54Jan 20, 2025Updated last year
- ☆26Nov 21, 2024Updated last year
- This project is the Torch implementation of our ICCV 2017 paper: Centered Weight Normalization in Accelerating Training of Deep Neural…☆21Dec 7, 2019Updated 6 years ago
- A one-page-only CGraph-API-liked DAG project.☆26Feb 11, 2025Updated last year
- Llama3 Streaming Chat Sample☆22Apr 24, 2024Updated last year
- ☆26Aug 15, 2023Updated 2 years ago
- Standalone Flash Attention v2 kernel without libtorch dependency☆114Sep 10, 2024Updated last year
- ☆27Aug 5, 2022Updated 3 years ago
- A plugin to make view transformer from perspective view to bird-eye-view, it is used in bevdet☆25Feb 24, 2023Updated 2 years ago
- ☆30Nov 16, 2024Updated last year
- ☆25Apr 16, 2022Updated 3 years ago
- A Minimalistic Auto-Diff Optimization Framework for Teaching and Understanding Pytorch☆26Jan 23, 2026Updated 3 weeks ago
- 3d object detection model smoke c++ inference code☆39Dec 1, 2022Updated 3 years ago
- This is a repository to practice multi-thread programming in C++☆28Feb 21, 2024Updated last year
- [ICLR 2025] Official PyTorch Implementation for CPE: Concept Pinpoint Eraser for Text-to-image Diffusion Models via Residual Attention Ga…☆12Apr 7, 2025Updated 10 months ago
- yolov7-pose end2end TRT实现☆27Sep 8, 2022Updated 3 years ago
- [ICRA 2024] WLST: Weak Labels Guided Self-training for Weakly-supervised Domain Adaptation on 3D Object Detection☆12Feb 6, 2024Updated 2 years ago
- c++实现的clip推理,模型有一点点改动,但是不大,改动和导出模型的代码可以在readme里找到,模型文件都在Releases里,包括AX650的模型。新增支持ChineseCLIP☆31Jun 19, 2025Updated 7 months ago