OpenVINO-dev-contest / llama2.openvino

☆44

Related projects: ⓘ

openvinotoolkit / openvino.genai
Run Generative AI models using native OpenVINO C++ API
☆107Updated this week
huggingface / optimum-intel
🤗 Optimum Intel: Accelerate inference with Intel optimization tools
☆380Updated this week
EmbeddedLLM / vllm-rocm
vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs
☆87Updated this week
intel / neural-speed
An innovative library for efficient LLM inference via low-bit quantization
☆342Updated 2 weeks ago
huggingface / optimum-habana
Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)
☆144Updated this week
intel / auto-round
Advanced Quantization Algorithm for LLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent for t…
☆205Updated this week
huggingface / optimum-amd
AMD related optimizations for transformer models
☆46Updated this week
abetlen / ggml-python
Python bindings for ggml
☆125Updated 2 weeks ago
mlc-ai / llm-perf-bench
☆110Updated 4 months ago
intel / llm-on-ray
Pretrain, finetune and serve LLMs on Intel platforms with Ray
☆95Updated this week
wejoncy / QLLM
A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ, and export to onnx/onnx-runtime easily.
☆141Updated 3 weeks ago
openvinotoolkit / awesome-openvino
A curated list of OpenVINO based AI projects
☆92Updated 3 weeks ago
premAI-io / benchmarks
🕹️ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.
☆129Updated last month
huggingface / optimum-benchmark
🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…
☆231Updated this week
neuralmagic / nm-vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
☆250Updated this week
tpoisonooo / llama.onnx
LLaMa/RWKV onnx models, quantization and testcase
☆345Updated last year
triton-inference-server / vllm_backend
☆170Updated this week
intel / intel-extension-for-deepspeed
Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…
☆56Updated 3 weeks ago
intel / ai-containers
This repository contains Dockerfiles, scripts, yaml files, Helm charts, etc. used to scale out AI containers with versions of TensorFlow …
☆23Updated this week
bentoml / BentoVLLM
Self-host LLMs with vLLM and BentoML
☆62Updated this week
onnx / turnkeyml
The no-code AI toolchain
☆63Updated last week
OpenVINO-dev-contest / ChineseLLM.openvino
☆17Updated this week
microsoft / batch-inference
Dynamic batching library for Deep Learning inference. Tutorials for LLM, GPT scenarios.
☆81Updated last month
huggingface / tgi-gaudi
Large Language Model Text Generation Inference on Habana Gaudi
☆24Updated last week
ikawrakow / ik_llama.cpp
llama.cpp clone with additional SOTA quants and improved CPU performance
☆57Updated this week
triton-inference-server / triton_cli
Triton CLI is an open source command line interface that enables users to create, deploy, and profile models served by the Triton Inferen…
☆47Updated 2 weeks ago
mlc-ai / relax
☆148Updated this week
intel / intel-extension-for-openxla
☆38Updated this week
triton-inference-server / openvino_backend
OpenVINO backend for Triton.
☆29Updated last week
GreenBitAI / green-bit-llm
A toolkit for fine-tuning, inferencing, and evaluating GreenBitAI's LLMs.
☆68Updated 2 months ago