sunkx109 / llama.cppLinks

llama 2 Inference

☆43

Alternatives and similar repositories for llama.cpp

Users that are interested in llama.cpp are comparing it to the libraries listed below

Sorting:

JieRen98 / SGEMM-SASS-Annotation
☆21Updated 4 years ago
OpenPPL / ppl.llm.kernel.cuda
☆152Updated last year
InfiniTensor / RefactorGraph
分层解耦的深度学习推理引擎
☆79Updated 11 months ago
CalvinXKY / BasicCUDA
A tutorial for CUDA&PyTorch
☆227Updated last week
RussWong / LLM-engineering
☆26Updated 5 months ago
caiwanxianhust / FasterLLaMA
使用 CUDA C++ 实现的 llama 模型推理框架
☆64Updated last year
OpenPPL / ppl.pmx
☆60Updated last year
dianhsu / transformer-cpp-cpu
用C++实现一个简单的Transformer模型。 Attention Is All You Need。
☆53Updated 4 years ago
AyakaGEMM / Hands-on-GEMM
☆145Updated last year
BBuf / how-to-optimize-gemm
☆98Updated 4 years ago
GetUpEarlier / minit
☆27Updated last year
Syencil / Programming_Massively_Parallel_Processors
CUDA 6大并行计算模式代码与笔记
☆61Updated 5 years ago
OpenPPL / ppl.kernel.cuda
☆38Updated last year
Cambricon / mlu-ops
Efficient operation implementation based on the Cambricon Machine Learning Unit (MLU) .
☆150Updated 2 weeks ago
zjhellofss / KuiperCourse
b站上的课程
☆82Updated 2 years ago
OpenPPL / ppl.nn.llm
☆141Updated last year
OpenPPL / ppl.llm.serving
☆130Updated last year
weishengying / cute_gemm
☆21Updated last year
OpenPPL / ppl.kernel.cpu
☆19Updated last year
Tlntin / trt2023
☆26Updated 2 years ago
njuhope / cuda_sgemm
☆120Updated last year
hyperai / triton-cn
Triton Documentation in Chinese Simplified / Triton 中文文档
☆102Updated last month
piDack / The-ans-for-Programming-Massively-Parallel-Processor
大规模并行处理器编程实战第二版答案
☆35Updated 3 years ago
openmlsys / openmlsys-cuda
Tutorials for writing high-performance GPU operators in AI frameworks.
☆136Updated 2 years ago
MARD1NO / CUDA-PPT
☆118Updated 10 months ago
MegEngine / mperf
mperf是一个面向移动/嵌入式平台的算子性能调优工具箱
☆193Updated 2 years ago
tlc-pack / cutlass_fpA_intB_gemm
A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer
☆96Updated 4 months ago
harleyszhang / llm_counts
llm theoretical performance analysis tools and support params, flops, memory and latency analysis.
☆115Updated 6 months ago
weishengying / cutlass_flash_atten_fp8
使用 cutlass 仓库在 ada 架构上实现 fp8 的 flash attention
☆78Updated last year
weishengying / tiny-flash-attention
使用 cutlass 实现 flash-attention 精简版，具有教学意义
☆54Updated last year