intel / ipex-llm-tutorialLinks

Accelerate LLM with low-bit (FP4 / INT4 / FP8 / INT8) optimizations using ipex-llm

☆169

Alternatives and similar repositories for ipex-llm-tutorial

Users that are interested in ipex-llm-tutorial are comparing it to the libraries listed below

Sorting:

intel / xFasterTransformer
☆431Updated 2 months ago
ninehills / llm-inference-benchmark
LLM Inference benchmark
☆431Updated last year
modelscope / dash-infer
DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including …
☆267Updated 3 months ago
MegEngine / InferLLM
a lightweight LLM model inference framework
☆744Updated last year
microsoft / T-MAC
Low-bit LLM inference on CPU/NPU with lookup table
☆895Updated 5 months ago
mindspore-lab / mindformers
☆178Updated this week
flagos-ai / FlagScale
FlagScale is a large model toolkit based on open-sourced projects.
☆412Updated last week
david-xinyuwei / david-share
☆368Updated this week
omni-ai-npu / omni-infer
Omni_Infer is a suite of inference accelerators designed for the Ascend NPU platform, offering native support and an expanding feature se…
☆88Updated last week
hyperai / triton-cn
Triton Documentation in Chinese Simplified / Triton 中文文档
☆91Updated last week
inferflow / inferflow
Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).
☆249Updated last year
Ascend / pytorch
Ascend PyTorch adapter (torch_npu). Mirror of https://gitee.com/ascend/pytorch
☆458Updated last week
MooreThreads / vllm_musa
A high-throughput and memory-efficient inference and serving engine for LLMs
☆69Updated last year
alipay / PainlessInferenceAcceleration
Accelerate inference without tears
☆368Updated last week
Tencent / KsanaLLM
☆513Updated last week
hpcaitech / SwiftInfer
Efficient AI Inference & Serving
☆478Updated last year
QwenLM / qwen.cpp
C++ implementation of Qwen-LM
☆608Updated 11 months ago
liguodongiot / ai-system
LLM/MLOps/LLMOps
☆122Updated 6 months ago
chenyangMl / llama2.c-zh
支持中文场景的的小语言模型 llama2.c-zh
☆150Updated last year
hyperai / vllm-cn
vLLM Documentation in Chinese Simplified / vLLM 中文文档
☆128Updated last week
intel / llm-on-ray
Pretrain, finetune and serve LLMs on Intel platforms with Ray
☆130Updated 2 months ago
Tlntin / qwen-ascend-llm
☆52Updated last year
HFAiLab / hai-platform
一种任务级GPU算力分时调度的高性能深度学习训练平台
☆714Updated 2 years ago
OpenBMB / CPM.cu
CPM.cu is a lightweight, high-performance CUDA implementation for LLMs, optimized for end-device inference and featuring cutting-edge tec…
☆205Updated last month
alibaba / rtp-llm
RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
☆925Updated last week
wangzhaode / llm-export
llm-export can export llm model to onnx.
☆330Updated last month
OpenCSGs / llm-inference
llm-inference is a platform for publishing and managing llm inference, providing a wide range of out-of-the-box features for model deploy…
☆89Updated last year
DeepLink-org / deeplink.framework
☆72Updated last year
SmartFlowAI / LLM101n-CN
LLM101n: Let's build a Storyteller 中文版
☆135Updated last year
ZRayZzz / flash-attention-v100
☆62Updated last year