dblalock / bolt
10x faster matrix and vector operations
☆2,480Updated 2 years ago
Alternatives and similar repositories for bolt:
Users that are interested in bolt are comparing it to the libraries listed below
- AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (N…☆4,630Updated 3 weeks ago
- An efficient C++17 GPU numerical computing library with Python-like syntax☆1,313Updated this week
- Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackab…☆1,564Updated last year
- Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code. Tupl…☆809Updated last month
- A Python-level JIT compiler designed to make unmodified PyTorch programs faster.☆1,040Updated last year
- SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX R…☆2,380Updated this week
- A machine learning compiler for GPUs, CPUs, and ML accelerators☆3,112Updated this week
- Up to 200x Faster Dot Products & Similarity Metrics — for Python, Rust, C, JS, and Swift, supporting f64, f32, f16 real & complex, i8, an…☆1,343Updated 3 weeks ago
- Library for reading and writing large multi-dimensional arrays.☆1,401Updated this week
- [ARCHIVED] The C++ Standard Library for your entire system. See https://github.com/NVIDIA/cccl☆2,300Updated last year
- cuML - RAPIDS Machine Learning Library☆4,652Updated this week
- Hummingbird compiles trained ML models into tensor computation for faster inference.☆3,434Updated last week
- General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). …☆2,189Updated last month
- PyTorch extensions for high performance and large scale training.☆3,306Updated 2 weeks ago
- Library for 8-bit optimizers and quantization routines.☆716Updated 2 years ago
- PatrickStar enables Larger, Faster, Greener Pretrained Models for NLP and democratizes AI for everyone.☆758Updated 2 years ago
- A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.☆1,190Updated this week
- ☆2,736Updated last year
- ArrayFire: a general purpose GPU library.☆4,685Updated 3 weeks ago
- functorch is JAX-like composable function transforms for PyTorch.☆1,422Updated this week
- A C++ standalone library for machine learning☆5,361Updated 3 weeks ago
- Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀☆1,684Updated 6 months ago
- Train to 94% on CIFAR-10 in <6.3 seconds on a single A100. Or ~95.79% in ~110 seconds (or less!)☆1,252Updated 4 months ago
- Your PyTorch AI Factory - Flash enables you to easily configure and run complex AI recipes for over 15 tasks across 7 data domains☆1,741Updated last year
- common in-memory tensor structure☆982Updated 2 weeks ago
- FFCV: Fast Forward Computer Vision (and other ML workloads!)☆2,921Updated 10 months ago
- A uniform interface to run deep learning models from multiple frameworks☆935Updated last year
- Flexible and powerful tensor operations for readable and reliable code (for pytorch, jax, TF and others)☆8,861Updated 2 months ago
- JAX-based neural network library☆3,013Updated this week
- ☆471Updated 3 years ago