intel-analytics / ipex-llm-tutorial
Accelerate LLM with low-bit (FP4 / INT4 / FP8 / INT8) optimizations using ipex-llm
☆152Updated 5 months ago
Alternatives and similar repositories for ipex-llm-tutorial:
Users that are interested in ipex-llm-tutorial are comparing it to the libraries listed below
- ☆392Updated this week
- DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including …☆208Updated this week
- LLM Inference benchmark☆377Updated 5 months ago
- 通义千问VLLM推理部署DEMO☆496Updated 9 months ago
- run ChatGLM2-6B in BM1684X☆49Updated 10 months ago
- llm-export can export llm model to onnx.☆255Updated last week
- Low-bit LLM inference on CPU with lookup table☆646Updated last week
- vLLM Documentation in Chinese Simplified / vLLM 中文文档☆22Updated last week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆29Updated this week
- FlagPerf is an open-source software platform for benchmarking AI chips.☆317Updated 2 weeks ago
- 大模型/LLM推理和部署理论与实践☆140Updated 2 weeks ago
- LLM101n: Let's build a Storyteller 中文版☆121Updated 5 months ago
- C++ implementation of Qwen-LM☆569Updated last month
- Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline mod…☆373Updated 4 months ago
- llm-inference is a platform for publishing and managing llm inference, providing a wide range of out-of-the-box features for model deploy…☆74Updated 8 months ago
- ☆302Updated 3 weeks ago
- 支持中文场景的的小语言模型 llama2.c-zh☆145Updated 10 months ago
- This is a user guide for the MiniCPM and MiniCPM-V series of small language models (SLMs) developed by ModelBest. “面壁小钢炮” focuses on achi…☆171Updated 2 months ago
- RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.☆582Updated 3 months ago
- Ascend PyTorch adapter (torch_npu). Mirror of https://gitee.com/ascend/pytorch☆290Updated this week
- 从0开始,将chatgpt的技术路线跑一遍。☆184Updated 4 months ago
- FlagGems is an operator library for large language models implemented in Triton Language.☆397Updated this week
- a lightweight LLM model inference framework☆712Updated 9 months ago
- Run Generative AI models with simple C++/Python API and using OpenVINO Runtime☆198Updated this week
- Phi2-Chinese-0.2B 从0开始训练自己的Phi2中文小模型,支持接入langchain加载本地知识库做检索增强生成RAG。Training your own Phi2 small chat model from scratch.☆516Updated 6 months ago
- [EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a V…☆382Updated this week
- Triton Documentation in Chinese Simplified / Triton 中文文档☆52Updated last week
- A streamlined and customizable framework for efficient large model evaluation and performance benchmarking☆357Updated this week
- Mixture-of-Experts (MoE) Language Model☆184Updated 4 months ago
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆108Updated 2 months ago