intel / ipex-llmLinks
Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, DeepSeek, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, DeepSpeed, Axolotl, etc.
☆8,426Updated 3 weeks ago
Alternatives and similar repositories for ipex-llm
Users that are interested in ipex-llm are comparing it to the libraries listed below
Sorting:
- Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray☆24Updated 5 years ago
- BigDL: Distributed TensorFlow, Keras and PyTorch on Apache Spark/Flink & Ray☆2,687Updated 3 weeks ago
- TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.☆3,870Updated 2 years ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆61,727Updated this week
- SGLang is a fast serving framework for large language models and vision language models.☆19,462Updated last week
- A Python package for extending the official PyTorch that can easily obtain performance on Intel platform☆1,986Updated this week
- Open deep learning compiler stack for cpu, gpu and specialized accelerators☆12,776Updated this week
- High-speed Large Language Model Serving for Local Deployment☆8,374Updated 3 months ago
- Accelerate LLM with low-bit (FP4 / INT4 / FP8 / INT8) optimizations using ipex-llm☆169Updated 6 months ago
- oneAPI Deep Neural Network Library (oneDNN)☆3,905Updated this week
- Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.☆14,620Updated this week
- Development repository for the Triton language and compiler☆17,392Updated last week
- Fast and memory-efficient exact attention☆20,280Updated this week
- Simple and Distributed Machine Learning☆5,173Updated this week
- Machine Learning Toolkit for Kubernetes☆15,257Updated 2 months ago
- Go ahead and axolotl questions☆10,716Updated this week
- ⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Pl…☆2,165Updated last year
- Tensor library for machine learning☆13,361Updated this week
- The Triton Inference Server provides an optimized cloud and edge inferencing solution.☆9,972Updated this week
- Transformer related optimization, including BERT, GPT☆6,338Updated last year
- TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizat…☆12,008Updated this week
- An open source ML system for the end-to-end data science lifecycle☆1,067Updated this week
- OpenVINO™ is an open source toolkit for optimizing and deploying AI inference☆9,119Updated last week
- Breeze is/was a numerical processing library for Scala.☆3,456Updated last month
- Accessible large language models via k-bit quantization for PyTorch.☆7,716Updated this week
- SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX R…☆2,521Updated this week
- Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.☆39,664Updated this week
- Build and run Docker containers leveraging NVIDIA GPUs☆17,431Updated last year
- Interactive and Reactive Data Science using Scala and Spark.☆3,153Updated 2 years ago
- Large Language Model Text Generation Inference☆10,605Updated last month