woodx9 / tllmLinks

create your own llm inference server from scratch

☆12

Alternatives and similar repositories for tllm

Users that are interested in tllm are comparing it to the libraries listed below

Sorting:

harleyszhang / lite_llama
A light llama-like llm inference framework based on the triton kernel.
☆144Updated last week
caiwanxianhust / FasterLLaMA
使用 CUDA C++ 实现的 llama 模型推理框架
☆58Updated 8 months ago
zjhellofss / KuiperCourse
b站上的课程
☆75Updated last year
InfiniTensor / RefactorGraph
分层解耦的深度学习推理引擎
☆74Updated 5 months ago
HuPengsheet / EasyNN
EasyNN是一个面向教学而开发的神经网络推理框架，旨在让大家0基础也能自主完成推理框架编写！
☆31Updated 11 months ago
CalvinXKY / BasicCUDA
A tutorial for CUDA&PyTorch
☆150Updated 6 months ago
OpenPPL / ppl.kernel.cuda
☆37Updated 9 months ago
zjhellofss / triton_course
☆33Updated 2 months ago
ppppppppig / lite_lang
一个轻量化的大模型推理框架
☆20Updated 2 months ago
TRT2022 / trtllm-llama
☢️ TensorRT 2023复赛——基于TensorRT-LLM的Llama模型推断加速优化
☆50Updated last year
harleyszhang / llm_counts
llm theoretical performance analysis tools and support params, flops, memory and latency analysis.
☆101Updated 3 weeks ago
zpye / SimpleInfer
A simple neural network inference framework
☆25Updated 2 years ago
Syencil / Programming_Massively_Parallel_Processors
CUDA 6大并行计算模式代码与笔记
☆60Updated 5 years ago
wangzyon / trt_learn
TensorRT encapsulation, learn, rewrite, practice.
☆28Updated 2 years ago
ifromeast / cuda_learning
learning how CUDA works
☆295Updated 5 months ago
openmlsys / openmlsys-cuda
Tutorials for writing high-performance GPU operators in AI frameworks.
☆129Updated last year
OpenPPL / ppl.pmx
☆59Updated 8 months ago
iclementine / optimize_softmax
Optimize softmax in triton in many cases
☆21Updated 11 months ago
luliyucoordinate / flash-attention-minimal
Flash Attention in ~100 lines of CUDA (forward pass only)
☆10Updated last year
leimao / TensorRT-Custom-Plugin-Example
Quick and Self-Contained TensorRT Custom Plugin Implementation and Integration
☆66Updated 2 months ago
AyakaGEMM / Hands-on-GEMM
☆137Updated last year
torchpipe / torchpipe
Serving Inside Pytorch
☆163Updated this week
OpenPPL / ppl.kernel.cpu
☆17Updated last year
FeiGeChuanShu / trt2023
NVIDIA TensorRT Hackathon 2023复赛选题：通义千问Qwen-7B用TensorRT-LLM模型搭建及优化
☆42Updated last year
caibucai22 / awesome-cuda
Awesome code, projects, books, etc. related to CUDA
☆21Updated 3 weeks ago
OpenPPL / ppl.nn.llm
☆139Updated last year
Tlntin / trt2023
☆26Updated last year
FlagOpen / FlagCX
☆81Updated last week
doorteeth / learn_cuda
☆41Updated 3 years ago
JieRen98 / SGEMM-SASS-Annotation
☆21Updated 4 years ago