OpenBMB / cpm_kernelsLinks

☆25

Alternatives and similar repositories for cpm_kernels

Users that are interested in cpm_kernels are comparing it to the libraries listed below

Sorting:

Rayrtfr / FasterTransformer
Transformer related optimization, including BERT, GPT
☆17Updated 2 years ago
Ascend / AscendSpeed
☆79Updated last year
SkyworkAI / vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
☆16Updated last year
OpenPPL / ppl.llm.serving
☆129Updated 10 months ago
THUDM / FasterTransformer
Transformer related optimization, including BERT, GPT
☆39Updated 2 years ago
Oneflow-Inc / models
Models and examples built with OneFlow
☆100Updated last year
TRT2022 / trtllm-llama
☢️ TensorRT 2023复赛——基于TensorRT-LLM的Llama模型推断加速优化
☆50Updated 2 years ago
Oneflow-Inc / one-glm
A more efficient GLM implementation!
☆54Updated 2 years ago
void-main / FasterTransformer
Transformer related optimization, including BERT, GPT
☆59Updated 2 years ago
DeepLink-org / dlinfer
☆64Updated this week
LowinLi / transformers-stream-generator
This is a text generation method which returns a generator, streaming out each token in real-time during inference, based on Huggingface/…
☆97Updated last year
neuralmagic / AutoFP8
☆205Updated 5 months ago
wangguojim / LargeScale
☆19Updated last year
Adlik / smoothquantplus
[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
☆23Updated last year
volcengine / veGiantModel
☆219Updated 2 years ago
luchangli03 / onnxsim_large_model
simplify >2GB large onnx model
☆63Updated 10 months ago
modelscope / dash-infer
DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including …
☆266Updated 2 months ago
DeepLink-org / ditorch
☆23Updated 9 months ago
wejoncy / QLLM
A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ/VPTQ, and export to onnx/onnx-runtime easily.
☆180Updated 6 months ago
inferflow / inferflow
Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).
☆248Updated last year
K024 / chatglm-q
Another ChatGLM2 implementation for GPTQ quantization
☆53Updated 2 years ago
Tencent / AngelSlim
Model compression toolkit engineered for enhanced usability, comprehensiveness, and efficiency.
☆185Updated this week
ModelTC / awesome-lm-system
Summary of system papers/frameworks/codes/tools on training or serving large model
☆57Updated last year
InternLM / turbomind
☆97Updated 7 months ago
YellowOldOdd / SDBI
Simple Dynamic Batching Inference
☆145Updated 3 years ago
AniZpZ / AutoSmoothQuant
An easy-to-use package for implementing SmoothQuant for LLMs
☆107Updated 6 months ago
wangkuiyi / huggingface-tokenizer-in-cxx
☆69Updated 2 years ago
anyscale / llm-continuous-batching-benchmarks
☆121Updated last year
zhaochenyang20 / ModelServer
Efficient, Flexible, and Highly Fault-Tolerant Model Service Management Based on SGLang
☆58Updated 11 months ago
OpenBMB / BMCook
Model Compression for Big Models
☆165Updated 2 years ago