vaibhawvipul / performance-engineering
☆24Updated 2 years ago
Alternatives and similar repositories for performance-engineering:
Users that are interested in performance-engineering are comparing it to the libraries listed below
- LLM training in simple, raw C/CUDA☆92Updated 10 months ago
- ☆27Updated 2 months ago
- Benchmark tests supporting the TiledCUDA library.☆15Updated 4 months ago
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)☆60Updated this week
- A tracing JIT compiler for PyTorch☆13Updated 3 years ago
- Benchmarks to capture important workloads.☆30Updated last month
- Machine Learning Agility (MLAgility) benchmark and benchmarking tools☆38Updated 3 weeks ago
- How to Build an LLVM Backend, published by Packt☆22Updated last month
- Numbast is a tool to build an automated pipeline that converts CUDA APIs into Numba bindings.☆41Updated this week
- Worked example of the process from Python source to CUDA kernel execution with Numba☆37Updated 6 months ago
- No-GIL Python environment featuring NVIDIA Deep Learning libraries.☆53Updated 3 weeks ago
- ML/DL Math and Method notes☆59Updated last year
- Make triton easier☆47Updated 9 months ago
- The CUDA target for Numba☆91Updated this week
- A repository of PyTorch example☆9Updated last year
- Inference Llama 2 in C++☆44Updated 11 months ago
- Sample Code for “Sequential and Parallel Algorithms and Data Structures -- The Basic Toolbox” Book☆25Updated 7 years ago
- a Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization in pure C.☆21Updated 8 months ago
- Random number library that generate pseudo-random and quasi-random numbers.☆26Updated this week
- benchmarking some transformer deployments☆26Updated 2 years ago
- Learn CUDA with PyTorch☆19Updated last month
- Some CUDA example code with READMEs.☆93Updated 3 weeks ago
- Write a fast kernel and run it on Discord. See how you compare against the best!☆34Updated this week
- Benchmark for machine learning model online serving (LLM, embedding, Stable-Diffusion, Whisper)☆28Updated last year
- CUDA Templates for Linear Algebra Subroutines☆16Updated this week
- FAST Randomized SVD on a GPU with CUDA 🏎️☆11Updated 5 years ago
- Embroid: Unsupervised Prediction Smoothing Can Improve Few-Shot Classification☆11Updated last year
- ☆14Updated 4 months ago
- Loop Nest - Linear algebra compiler and code generator.☆22Updated 2 years ago
- Awesome Triton Resources☆20Updated 3 months ago