intel / ipex-llm-tutorialLinks
Accelerate LLM with low-bit (FP4 / INT4 / FP8 / INT8) optimizations using ipex-llm
☆168Updated 8 months ago
Alternatives and similar repositories for ipex-llm-tutorial
Users that are interested in ipex-llm-tutorial are comparing it to the libraries listed below
Sorting:
- ☆436Updated 4 months ago
- LLM Inference benchmark☆432Updated last year
- DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including …☆274Updated 5 months ago
- LLM/MLOps/LLMOps☆131Updated 7 months ago
- run DeepSeek-R1 GGUFs on KTransformers☆259Updated 10 months ago
- ☆390Updated this week
- 支持中文场景的的小语言模型 llama2.c-zh☆150Updated last year
- ☆181Updated this week
- Accelerate inference without tears☆372Updated 2 months ago
- Ascend PyTorch adapter (torch_npu). Mirror of https://gitee.com/ascend/pytorch☆478Updated this week
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆131Updated 3 months ago
- a lightweight LLM model inference framework☆748Updated last year
- FlagScale is a large model toolkit based on open-sourced projects.☆466Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆75Updated last year
- Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).☆251Updated last year
- Low-bit LLM inference on CPU/NPU with lookup table☆912Updated 7 months ago
- Run generative AI models in sophgo BM1684X/BM1688☆260Updated 2 weeks ago
- ☆520Updated 2 weeks ago
- CPM.cu is a lightweight, high-performance CUDA implementation for LLMs, optimized for end-device inference and featuring cutting-edge tec…☆219Updated last week
- LLM101n: Let's build a Storyteller 中文版☆138Updated last year
- A high-performance inference system for large language models, designed for production environments.☆490Updated last month
- ☆72Updated 2 weeks ago
- Triton Documentation in Chinese Simplified / Triton 中文文档☆99Updated last month
- export llama to onnx☆137Updated last year
- C++ implementation of Qwen-LM☆617Updated last year
- vLLM Documentation in Chinese Simplified / vLLM 中文文档☆152Updated last month
- Efficient AI Inference & Serving☆480Updated 2 years ago
- 一种任务级GPU算力分时调度的高性能深度学习训练平台☆731Updated 2 years ago
- LLM 推理服务性能测试☆44Updated 2 years ago
- Run Generative AI models with simple C++/Python API and using OpenVINO Runtime☆419Updated this week