ankan-ban / llama2.cu
View external linksLinks

Inference Llama 2 in one file of pure Cuda

☆17

Alternatives and similar repositories for llama2.cu

Users that are interested in llama2.cu are comparing it to the libraries listed below

Sorting:

ankan-ban / llama_cu_awq
View on GitHub
llama INT4 cuda inference with AWQ
☆54Jan 20, 2025Updated last year
guoguo1314 / llama3_learn.c
View on GitHub
Inference deployment of the llama3
☆11Apr 21, 2024Updated last year
luliyucoordinate / flash-attention-minimal
View on GitHub
Flash Attention in ~100 lines of CUDA (forward pass only)
☆11Jun 10, 2024Updated last year
isLinXu / paper-read-notes
View on GitHub
paper-read-notes
☆13Sep 26, 2024Updated last year
nuwandda / cuda-histogram-equalization
View on GitHub
Implementation of a histogram equalization program using CUDA. Histogram equalization is a technique for adjusting image intensities to e…
☆13Jan 3, 2021Updated 5 years ago
richjjj / duscratch
View on GitHub
搜藏的希望的代码片段
☆13Jun 6, 2023Updated 2 years ago
RobertCsordas / moe_layer
View on GitHub
sigma-MoE layer
☆21Jan 5, 2024Updated 2 years ago
BaofengZan / mnn-llm-GOT-OCR2.0
View on GitHub
使用mnn-llm对GOT-OCR2.0进行推理
☆14Oct 2, 2024Updated last year
nlp-uoregon / oregon_gpt_oss_patching
View on GitHub
Efficient Finetuning for OpenAI GPT-OSS
☆23Oct 2, 2025Updated 4 months ago
Phoenix8215 / learn-TensorRT-from-scratch
View on GitHub
learn TensorRT from scratch🥰
☆18Sep 29, 2024Updated last year
bobbych94 / yolact-trt
View on GitHub
segmentation algorithm yolact use tensorrt deploy
☆14May 7, 2022Updated 3 years ago
inisis / OnnxLLM
View on GitHub
Large Language Model Onnx Inference Framework
☆35Nov 25, 2025Updated 2 months ago
ppppppppig / lite_lang
View on GitHub
一个轻量化的大模型推理框架
☆21May 26, 2025Updated 8 months ago
fattorib / Little-GPT
View on GitHub
GPT* - Training faster small transformers using ALiBi, Parallel Residual Connections and more!
☆21Oct 29, 2022Updated 3 years ago
zhahoi / PlateRecognition_ncnn
View on GitHub
高性能高精度大陆车牌、港澳车牌、台湾车牌韩国车牌(South Korea LPR)识别代码开源(ncnn移植）
☆41Nov 5, 2025Updated 3 months ago
yuxiaoranyu / stable_diffusion_trt_triton
View on GitHub
☆20Dec 29, 2023Updated 2 years ago
raymond1123 / Flash-Attention
View on GitHub
☆26Nov 21, 2024Updated last year
owensgroup / gpustats
View on GitHub
Statistics on GPUs
☆33Sep 8, 2025Updated 5 months ago
Tlntin / trt2023
View on GitHub
☆26Aug 15, 2023Updated 2 years ago
mhm0902 / Tensor_cpp
View on GitHub
☆25Apr 16, 2022Updated 3 years ago
raymond1123 / hgemm
View on GitHub
☆30Nov 16, 2024Updated last year
darglein / TinyTorch
View on GitHub
A Minimalistic Auto-Diff Optimization Framework for Teaching and Understanding Pytorch
☆26Jan 23, 2026Updated 3 weeks ago
vedantroy / gpu_kernels
View on GitHub
☆27Jan 8, 2024Updated 2 years ago
jundaf2 / CUDA-INT8-GEMM
View on GitHub
CUDA 8-bit Tensor Core Matrix Multiplication based on m16n16k16 WMMA API
☆35Sep 15, 2023Updated 2 years ago
BaofengZan / yolov7-pose-e2e-trt
View on GitHub
yolov7-pose end2end TRT实现
☆27Sep 8, 2022Updated 3 years ago
globaledgesoft / Unsupported-Operation-Development-in-SNPE
View on GitHub
This project is intended to build and deploy an SNPE model on Qualcomm Devices, which are having unsupported layers which are not part of…
☆10Oct 4, 2021Updated 4 years ago
piDack / The-ans-for-Programming-Massively-Parallel-Processor
View on GitHub
大规模并行处理器编程实战第二版答案
☆35Jun 4, 2022Updated 3 years ago
jcharum / lxml-readability
View on GitHub
python port of arc90's readability bookmarklet, updated to match latest readability.js!
☆19Sep 13, 2011Updated 14 years ago
hpc203 / LDC-onnxrun-cpp-py
View on GitHub
使用ONNXRuntime部署一种用于边缘检测的轻量级密集卷积神经网络LDC，包含C++和Python两个版本的程序
☆11Apr 24, 2023Updated 2 years ago
zeroine / cutlass-cute-sample
View on GitHub
☆49Apr 15, 2024Updated last year
triple-mu / TensorRT2ONNX
View on GitHub
A tool convert TensorRT engine/plan to a fake onnx
☆41Nov 22, 2022Updated 3 years ago
NVIDIA / online-softmax
View on GitHub
Benchmark code for the "Online normalizer calculation for softmax" paper
☆105Jul 27, 2018Updated 7 years ago
shouxieai / nerf
View on GitHub
nerf
☆41Aug 1, 2022Updated 3 years ago
gmarkall / life-of-a-numba-kernel
View on GitHub
Worked example of the process from Python source to CUDA kernel execution with Numba
☆45Sep 11, 2024Updated last year
BaofengZan / my_trt_pro
View on GitHub
跟着Tensorrt_pro学习各种知识
☆40Nov 25, 2022Updated 3 years ago
weishengying / tiny-flash-attention
View on GitHub
使用 cutlass 实现 flash-attention 精简版，具有教学意义
☆56Aug 12, 2024Updated last year
LeastAuthority / haskell-magic-wormhole
View on GitHub
Magic Wormhole for Haskell
☆11Apr 23, 2024Updated last year
DataXujing / YOLOv12-TensorRT
View on GitHub
YOLOv12 TensorRT 端到端模型加速推理和INT8量化实现
☆13Mar 5, 2025Updated 11 months ago
gesanqiu / Ericsson-Yolov3-SNPE
View on GitHub
关于算法处理实时视频流性能不足使用并行处理的方案和优化（APP层面）。
☆11Jun 5, 2021Updated 4 years ago

ankan-ban / llama2.cuView external linksLinks

Alternatives and similar repositories for llama2.cu

ankan-ban / llama2.cu
View external linksLinks