luliyucoordinate / flash-attention-minimalLinks

Flash Attention in ~100 lines of CUDA (forward pass only)

☆10

Alternatives and similar repositories for flash-attention-minimal

Users that are interested in flash-attention-minimal are comparing it to the libraries listed below

Sorting:

caibucai22 / awesome-cuda
Awesome code, projects, books, etc. related to CUDA
☆21Updated 3 weeks ago
DD-DuDa / TensorRT-in-Action
TensorRT-in-Action 是一个 GitHub 代码库，提供了使用 TensorRT 的代码示例，并有对应 Jupyter Notebook。
☆15Updated 2 years ago
Rythsman / TRT-Hackathon-2022-final
<Good Luck To You!> 's work for <TRT-Hackathon-2022-final>
☆7Updated 2 years ago
harleyszhang / lite_llama
A light llama-like llm inference framework based on the triton kernel.
☆144Updated last week
caiwanxianhust / FasterLLaMA
使用 CUDA C++ 实现的 llama 模型推理框架
☆58Updated 9 months ago
ppppppppig / lite_lang
一个轻量化的大模型推理框架
☆20Updated 2 months ago
hopef / llama3_chat
Llama3 Streaming Chat Sample
☆22Updated last year
Tlntin / trt2023
☆26Updated last year
Qingrenn / mmdeploy-summer-camp
🐱 ncnn int8 模型量化评估
☆13Updated 2 years ago
kalfazed / multi-thread-programming
This is a repository to practice multi-thread programming in C++
☆25Updated last year
richjjj / cuvid-tensorrt-multi
ffmpeg+cuvid+tensorrt+multicamera
☆12Updated 7 months ago
Phoenix8215 / learn-TensorRT-from-scratch
learn TensorRT from scratch🥰
☆15Updated 10 months ago
richjjj / duscratch
搜藏的希望的代码片段
☆13Updated 2 years ago
lxl24 / SwinTransformerV2_TensorRT
For 2022 Nvidia Hackathon
☆22Updated 3 years ago
coderonion / cuda-beginner-course-cpp-version
bilibili视频【CUDA 12.x 并行编程入门(C++版)】配套代码
☆29Updated 11 months ago
sesmfs / onnx_quant_tool
An onnx-based quantitation tool.
☆71Updated last year
wangzyon / trt_learn
TensorRT encapsulation, learn, rewrite, practice.
☆28Updated 2 years ago
li199603 / sgemm_with_cuda
SGEMM optimization with cuda step by step
☆20Updated last year
TRT2022 / trtllm-llama
☢️ TensorRT 2023复赛——基于TensorRT-LLM的Llama模型推断加速优化
☆50Updated last year
KuangjuX / CUDAKernels
🎉My Collections of CUDA Kernels~
☆11Updated last year
raymond1123 / hgemm
☆30Updated 8 months ago
cqu20160901 / DETR_onnx_tensorRT_V2
DETR tensor去除推理过程无用辅助头+fp16部署再次加速+解决转tensorrt 输出全为0问题的新方法。
☆12Updated last year
HuPengsheet / EasyNN
EasyNN是一个面向教学而开发的神经网络推理框架，旨在让大家0基础也能自主完成推理框架编写！
☆31Updated 11 months ago
leimao / TensorRT-Custom-Plugin-Example
Quick and Self-Contained TensorRT Custom Plugin Implementation and Integration
☆66Updated 2 months ago
OpenPPL / ppl.kernel.cuda
☆37Updated 9 months ago
weishengying / cutlass_flash_atten_fp8
使用 cutlass 仓库在 ada 架构上实现 fp8 的 flash attention
☆74Updated 11 months ago
Bruce-Lee-LY / cutlass_gemm
Multiple GEMM operators are constructed with cutlass to support LLM inference.
☆18Updated this week
hova88 / CUDA-MatMul-Practice
☆17Updated last year
luliyucoordinate / cute-flash-attention
Implement Flash Attention using Cute.
☆92Updated 7 months ago
Phoenix8215 / build_neural_network_from_scratch_CPP
Created a simple neural network using C++17 standard and the Eigen library that supports both forward and backward propagation.
☆9Updated last year