ppppppppig / lite_langLinks

一个轻量化的大模型推理框架

☆20

Alternatives and similar repositories for lite_lang

Users that are interested in lite_lang are comparing it to the libraries listed below

Sorting:

raymond1123 / hgemm
☆30Updated 8 months ago
harleyszhang / lite_llama
A light llama-like llm inference framework based on the triton kernel.
☆144Updated last week
caiwanxianhust / FasterLLaMA
使用 CUDA C++ 实现的 llama 模型推理框架
☆58Updated 9 months ago
wangzyon / trt_learn
TensorRT encapsulation, learn, rewrite, practice.
☆28Updated 2 years ago
luliyucoordinate / flash-attention-minimal
Flash Attention in ~100 lines of CUDA (forward pass only)
☆10Updated last year
Tlntin / trt2023
☆26Updated last year
shouxieai / tensorRT_quantization
该代码与B站上的视频 https://www.bilibili.com/video/BV18L41197Uz/?spm_id_from=333.788&vd_source=eefa4b6e337f16d87d87c2c357db8ca7 相关联。
☆69Updated last year
caibucai22 / awesome-cuda
Awesome code, projects, books, etc. related to CUDA
☆21Updated 3 weeks ago
hopef / llama3_chat
Llama3 Streaming Chat Sample
☆22Updated last year
Rythsman / TRT-Hackathon-2022-final
<Good Luck To You!> 's work for <TRT-Hackathon-2022-final>
☆7Updated 2 years ago
sesmfs / onnx_quant_tool
An onnx-based quantitation tool.
☆71Updated last year
DD-DuDa / TensorRT-in-Action
TensorRT-in-Action 是一个 GitHub 代码库，提供了使用 TensorRT 的代码示例，并有对应 Jupyter Notebook。
☆15Updated 2 years ago
kalfazed / multi-thread-programming
This is a repository to practice multi-thread programming in C++
☆25Updated last year
woodx9 / tllm
create your own llm inference server from scratch
☆12Updated 8 months ago
caijixueIT / CUDA_Learning_for_Freshman
☆11Updated 5 months ago
weishengying / tiny-flash-attention
使用 cutlass 实现 flash-attention 精简版，具有教学意义
☆45Updated 11 months ago
richjjj / duscratch
搜藏的希望的代码片段
☆13Updated 2 years ago
zjhellofss / triton_course
☆33Updated 2 months ago
raymond1123 / Flash-Attention
☆25Updated 8 months ago
wangzyon / pyInfer
async inference for machine learning model
☆26Updated 2 years ago
TRT2022 / trtllm-llama
☢️ TensorRT 2023复赛——基于TensorRT-LLM的Llama模型推断加速优化
☆50Updated last year
zpye / SimpleInfer
A simple neural network inference framework
☆25Updated 2 years ago
hova88 / CUDA-MatMul-Practice
☆17Updated last year
CalvinXKY / BasicCUDA
A tutorial for CUDA&PyTorch
☆150Updated 6 months ago
HuPengsheet / EasyNN
EasyNN是一个面向教学而开发的神经网络推理框架，旨在让大家0基础也能自主完成推理框架编写！
☆31Updated 11 months ago
OpenPPL / ppl.kernel.cuda
☆37Updated 9 months ago
jinmin527 / learning-cuda-trt
A large number of cuda/tensorrt cases . 大量案例来学习cuda/tensorrt
☆139Updated 3 years ago
ozanarmagan / clip_tokenizer_cpp
☆10Updated last year
leimao / TensorRT-Custom-Plugin-Example
Quick and Self-Contained TensorRT Custom Plugin Implementation and Integration
☆66Updated 2 months ago
torchpipe / torchpipe
Serving Inside Pytorch
☆163Updated last week