vaibhawvipul / performance-engineering
☆27Updated 2 years ago
Alternatives and similar repositories for performance-engineering
Users that are interested in performance-engineering are comparing it to the libraries listed below
Sorting:
- LLM training in simple, raw C/CUDA☆95Updated last year
- How to Build an LLVM Backend, published by Packt☆25Updated last week
- ☆27Updated 4 months ago
- Learn CUDA with PyTorch☆20Updated 3 months ago
- Explore training for quantized models☆18Updated 4 months ago
- Converting a deep neural network to integer-only inference in native C via uniform quantization and the fixed-point representation.☆23Updated 3 years ago
- Benchmarks to capture important workloads.☆31Updated 3 months ago
- ⛰️ RockyML - A High-Performance Scientific Computing Framework for Non-smooth Machine Learning Problems☆19Updated 2 years ago
- Write a fast kernel and run it on Discord. See how you compare against the best!☆44Updated this week
- Experimental plugin for scikit-learn to be able to run (some estimators) on Intel GPUs via numba-dpex.☆16Updated last year
- End to End steps for adding custom ops in PyTorch.☆22Updated 4 years ago
- ☆20Updated 9 years ago
- A repository of PyTorch example☆9Updated 2 years ago
- A tracing JIT compiler for PyTorch☆13Updated 3 years ago
- Benchmark for machine learning model online serving (LLM, embedding, Stable-Diffusion, Whisper)☆28Updated last year
- a Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization in pure C.☆21Updated 10 months ago
- ☆26Updated last year
- ML/DL Math and Method notes☆60Updated last year
- ☆14Updated last year
- Machine Learning Agility (MLAgility) benchmark and benchmarking tools☆39Updated this week
- A collection of reproducible inference engine benchmarks☆30Updated 3 weeks ago
- Benchmark tests supporting the TiledCUDA library.☆16Updated 5 months ago
- High-Performance SGEMM on CUDA devices☆91Updated 3 months ago
- ☆18Updated 3 months ago
- A lightweight MLIR Python frontend with support for PyTorch☆23Updated 8 months ago
- Notes and code for Programming Massively Parallel Processors☆11Updated last month
- General Matrix Multiplication using NVIDIA Tensor Cores☆17Updated 3 months ago
- Samples demonstrating how to use the Compute Sanitizer Tools and Public API☆82Updated last year
- Make triton easier☆47Updated 11 months ago
- ☆21Updated 2 months ago