neuralmagic / deepsparseLinks

Sparsity-aware deep learning inference runtime for CPUs

☆3,157

Alternatives and similar repositories for deepsparse

Users that are interested in deepsparse are comparing it to the libraries listed below

Sorting:

neuralmagic / sparseml
Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models
☆2,147Updated last month
neuralmagic / sparsezoo
Neural network model repository for highly sparse and sparse-quantized models with matching sparsification recipes
☆392Updated last month
neuralmagic / docs
Top-level directory for documentation and general content
☆122Updated last month
neuralmagic / sparsify
ML model optimization product to accelerate inference.
☆326Updated last month
huggingface / optimum
🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization…
☆2,977Updated this week
ELS-RD / kernl
Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackab…
☆1,575Updated last year
intel / neural-compressor
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX R…
☆2,449Updated this week
ELS-RD / transformer-deploy
Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀
☆1,688Updated 8 months ago
intel / intel-extension-for-transformers
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Pl…
☆2,167Updated 9 months ago
deepspeedai / DeepSpeed-MII
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
☆2,031Updated 2 weeks ago
Lightning-AI / lightning-thunder
PyTorch compiler that accelerates training and inference. Get built-in optimizations for performance, memory, parallelism, and easily wri…
☆1,375Updated this week
facebookincubator / AITemplate
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (N…
☆4,655Updated 3 months ago
triton-inference-server / pytriton
PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.
☆806Updated last week
marella / ctransformers
Python bindings for the Transformer models implemented in C/C++ using GGML library.
☆1,868Updated last year
pytorch / ao
PyTorch native quantization and sparsity for training and inference
☆2,168Updated this week
openxla / xla
A machine learning compiler for GPUs, CPUs, and ML accelerators
☆3,341Updated this week
huggingface / safetensors
Simple, safe way to store and distribute tensors
☆3,345Updated last week
bitsandbytes-foundation / bitsandbytes
Accessible large language models via k-bit quantization for PyTorch.
☆7,212Updated this week
JonasGeiping / cramming
Cramming the training of a (BERT-type) language model into limited compute.
☆1,338Updated last year
tysam-code / hlb-CIFAR10
Train to 94% on CIFAR-10 in <6.3 seconds on a single A100. Or ~95.79% in ~110 seconds (or less!)
☆1,271Updated 6 months ago
NVIDIA / FasterTransformer
Transformer related optimization, including BERT, GPT
☆6,238Updated last year
huggingface / optimum-quanto
A pytorch quantization backend for optimum
☆962Updated last week
microsoft / Llama-2-Onnx
☆1,029Updated last year
mit-han-lab / llm-awq
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
☆3,140Updated this week
tairov / llama2.mojo
Inference Llama 2 in one file of pure 🔥
☆2,115Updated last year
IST-DASLab / gptq
Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
☆2,139Updated last year
huggingface / optimum-nvidia
☆986Updated 5 months ago
flexflow / flexflow-train
Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training
☆1,809Updated this week
NVIDIA / TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Bla…
☆2,548Updated this week
explosion / curated-transformers
🤖 A PyTorch library of curated Transformer models and their composable components
☆892Updated last year