intel / ipex-llm-tutorialLinks
Accelerate LLM with low-bit (FP4 / INT4 / FP8 / INT8) optimizations using ipex-llm
☆164Updated 2 months ago
Alternatives and similar repositories for ipex-llm-tutorial
Users that are interested in ipex-llm-tutorial are comparing it to the libraries listed below
Sorting:
- LLM Inference benchmark☆421Updated 11 months ago
- DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including …☆257Updated last month
- ☆427Updated this week
- A light llama-like llm inference framework based on the triton kernel.☆128Updated 2 weeks ago
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆129Updated 2 months ago
- ☆128Updated 6 months ago
- Materials for learning SGLang☆457Updated last week
- run DeepSeek-R1 GGUFs on KTransformers☆238Updated 3 months ago
- C++ implementation of Qwen-LM☆595Updated 6 months ago
- LLM/MLOps/LLMOps☆94Updated last month
- run chatglm3-6b in BM1684X☆39Updated last year
- run ChatGLM2-6B in BM1684X☆49Updated last year
- ☆139Updated last year
- Run generative AI models in sophgo BM1684X/BM1688☆220Updated last week
- Ascend PyTorch adapter (torch_npu). Mirror of https://gitee.com/ascend/pytorch☆377Updated this week
- ☆168Updated this week
- llm-export can export llm model to onnx.☆297Updated 5 months ago
- ☆59Updated 7 months ago
- FlagScale is a large model toolkit based on open-sourced projects.☆308Updated this week
- llm-inference is a platform for publishing and managing llm inference, providing a wide range of out-of-the-box features for model deploy…☆82Updated last year
- Compare different hardware platforms via the Roofline Model for LLM inference tasks.☆101Updated last year
- ☆236Updated 2 weeks ago
- Triton Documentation in Chinese Simplified / Triton 中文文档☆71Updated 2 months ago
- Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).☆243Updated last year
- Community maintained hardware plugin for vLLM on Ascend☆815Updated this week
- FlagGems is an operator library for large language models implemented in the Triton Language.☆583Updated this week
- A low-latency & high-throughput serving engine for LLMs☆382Updated last month
- 支持中文场景的的小语言模型 llama2.c-zh☆147Updated last year
- ☆435Updated this week
- Efficient and easy multi-instance LLM serving☆440Updated this week