intel / ipex-llm-tutorial
Accelerate LLM with low-bit (FP4 / INT4 / FP8 / INT8) optimizations using ipex-llm
☆161Updated 7 months ago
Alternatives and similar repositories for ipex-llm-tutorial:
Users that are interested in ipex-llm-tutorial are comparing it to the libraries listed below
- DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including …☆236Updated 2 weeks ago
- ☆403Updated this week
- LLM Inference benchmark☆399Updated 7 months ago
- C++ implementation of Qwen-LM☆581Updated 2 months ago
- llm-export can export llm model to onnx.☆267Updated last month
- LLM101n: Let's build a Storyteller 中文版☆124Updated 6 months ago
- LLM/MLOps/LLMOps☆75Updated 5 months ago
- ☆154Updated this week
- run ChatGLM2-6B in BM1684X☆49Updated last year
- Compare different hardware platforms via the Roofline Model for LLM inference tasks.☆92Updated 11 months ago
- Ascend PyTorch adapter (torch_npu). Mirror of https://gitee.com/ascend/pytorch☆305Updated this week
- Run generative AI models in sophgo BM1684X☆175Updated this week
- Inference code for LLaMA models☆113Updated last year
- ☆250Updated this week
- A light llama-like llm inference framework based on the triton kernel.☆94Updated last week
- 通义千问VLLM推理部署DEMO☆527Updated 11 months ago
- 大模型/LLM推理和部署理论与实践☆184Updated 3 weeks ago
- FlagGems is an operator library for large language models implemented in Triton Language.☆429Updated this week
- Triton Documentation in Chinese Simplified / Triton 中文文档☆56Updated last month
- 校招、秋招、春招、实习好项目,带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。☆296Updated last month
- export llama to onnx☆114Updated 2 months ago
- a lightweight LLM model inference framework☆715Updated 10 months ago
- LLaMa/RWKV onnx models, quantization and testcase☆356Updated last year
- ☆127Updated 2 months ago
- This is a user guide for the MiniCPM and MiniCPM-V series of small language models (SLMs) developed by ModelBest. “面壁小钢炮” focuses on achi…☆211Updated 4 months ago
- Materials for learning SGLang☆299Updated this week
- Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).☆237Updated 11 months ago
- PaddlePaddle custom device implementaion. (『飞桨』自定义硬件接入实现)☆81Updated this week
- ☆38Updated 4 months ago