dblalock / bolt
10x faster matrix and vector operations
☆2,485Updated 2 years ago
Alternatives and similar repositories for bolt
Users that are interested in bolt are comparing it to the libraries listed below
Sorting:
- A Python-level JIT compiler designed to make unmodified PyTorch programs faster.☆1,044Updated last year
- Library for reading and writing large multi-dimensional arrays.☆1,407Updated this week
- FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/☆1,324Updated this week
- ☆471Updated 3 years ago
- High-efficiency floating-point neural network inference operators for mobile, server, and Web☆2,016Updated this week
- A machine learning compiler for GPUs, CPUs, and ML accelerators☆3,169Updated this week
- Codebase for "SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems"☆1,096Updated 4 years ago
- Up to 200x Faster Dot Products & Similarity Metrics — for Python, Rust, C, JS, and Swift, supporting f64, f32, f16 real & complex, i8, an…☆1,370Updated this week
- PatrickStar enables Larger, Faster, Greener Pretrained Models for NLP and democratizes AI for everyone.☆761Updated 2 years ago
- Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code. Tupl…☆810Updated 2 months ago
- An efficient C++17 GPU numerical computing library with Python-like syntax☆1,321Updated last week
- Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackab…☆1,565Updated last year
- A retargetable MLIR-based machine learning compiler and runtime toolkit.☆3,132Updated this week
- Hummingbird compiles trained ML models into tensor computation for faster inference.☆3,438Updated last month
- Fast Block Sparse Matrices for Pytorch☆545Updated 4 years ago
- CUDA Templates for Linear Algebra Subroutines☆7,540Updated this week
- A uniform interface to run deep learning models from multiple frameworks☆935Updated last year
- An open-source efficient deep learning framework/compiler, written in python.☆696Updated this week
- oneAPI Deep Neural Network Library (oneDNN)☆3,790Updated this week
- Efficient GPU kernels for block-sparse matrix multiplication and convolution☆1,040Updated last year
- Transformer related optimization, including BERT, GPT☆6,158Updated last year
- GPU implementation of a fast generalized ANS (asymmetric numeral system) entropy encoder and decoder, with extensions for lossless compre…☆335Updated last month
- A C++ standalone library for machine learning☆5,374Updated last month
- The Tensor Algebra SuperOptimizer for Deep Learning☆711Updated 2 years ago
- A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.☆984Updated 8 months ago
- tree is a library for working with nested data structures☆983Updated 3 months ago
- functorch is JAX-like composable function transforms for PyTorch.☆1,424Updated this week
- A performant and modular runtime for TensorFlow☆761Updated last month
- Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀☆1,682Updated 6 months ago
- common in-memory tensor structure☆986Updated this week