intel / ipex-llm-tutorial
Accelerate LLM with low-bit (FP4 / INT4 / FP8 / INT8) optimizations using ipex-llm
☆155Updated 6 months ago
Alternatives and similar repositories for ipex-llm-tutorial:
Users that are interested in ipex-llm-tutorial are comparing it to the libraries listed below
- ☆394Updated last week
- ☆243Updated last month
- llm-export can export llm model to onnx.☆257Updated last week
- 通义千问VLLM推理部署DEMO☆501Updated 10 months ago
- LLM Inference benchmark☆382Updated 6 months ago
- DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including …☆220Updated this week
- Ascend PyTorch adapter (torch_npu). Mirror of https://gitee.com/ascend/pytorch☆294Updated this week
- ☆151Updated last month
- Run generative AI models in sophgo BM1684X☆155Updated this week
- 模型压缩的小白入门教程☆226Updated 2 months ago
- PaddlePaddle custom device implementaion. (『飞桨』自定义硬件接入实现)☆78Updated last week
- LLM101n: Let's build a Storyteller 中文版☆122Updated 5 months ago
- llm theoretical performance analysis tools and support params, flops, memory and latency analysis.☆76Updated 3 weeks ago
- 大模型/LLM推理和部署理论与实践☆144Updated this week
- RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.☆607Updated last week
- export llama to onnx☆112Updated last month
- A pupil in the computer world.(Felix Fu)☆210Updated 7 months ago
- a lightweight LLM model inference framework☆713Updated 9 months ago
- unify-easy-llm(ULM)旨在打造一个简易的一键式大模型训练工具,支持Nvidia GPU、Ascend NPU等不同硬件以及常用的大模型。☆54Updated 6 months ago
- A light llama-like llm inference framework based on the triton kernel.☆78Updated 3 weeks ago
- A streamlined and customizable framework for efficient large model evaluation and performance benchmarking☆379Updated this week
- run chatglm3-6b in BM1684X☆37Updated 10 months ago
- run ChatGLM2-6B in BM1684X☆49Updated 10 months ago
- ☆140Updated 9 months ago
- [EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a V…☆390Updated last week
- Triton Documentation in Chinese Simplified / Triton 中文文档☆52Updated 2 weeks ago
- Llama3-Tutorial(XTuner、LMDeploy、OpenCompass)☆500Updated 8 months ago
- Run Generative AI models with simple C++/Python API and using OpenVINO Runtime☆201Updated this week
- LLM/MLOps/LLMOps☆66Updated 4 months ago
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆110Updated 2 months ago