vaibhawvipul / performance-engineering
☆24Updated 2 years ago
Alternatives and similar repositories for performance-engineering:
Users that are interested in performance-engineering are comparing it to the libraries listed below
- ⛰️ RockyML - A High-Performance Scientific Computing Framework for Non-smooth Machine Learning Problems☆19Updated last year
- a Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization in pure C.☆21Updated 7 months ago
- A tracing JIT compiler for PyTorch☆12Updated 3 years ago
- LLM training in simple, raw C/CUDA☆91Updated 9 months ago
- ☆17Updated 2 weeks ago
- benchmarking some transformer deployments☆26Updated last year
- Make triton easier☆44Updated 8 months ago
- Learn CUDA with PyTorch☆16Updated 3 weeks ago
- A Gentle Principled Introduction to Deep Reinforcement Learning☆19Updated 3 months ago
- Standalone commandline CLI tool for compiling Triton kernels☆17Updated 5 months ago
- Benchmark tests supporting the TiledCUDA library.☆15Updated 3 months ago
- Awesome Triton Resources☆20Updated 2 months ago
- Benchmarking PyTorch 2.0 different models☆21Updated last year
- Benchmarks to capture important workloads.☆29Updated 3 weeks ago
- SKIP for AI☆20Updated 5 years ago
- FlexAttention w/ FlashAttention3 Support☆26Updated 4 months ago
- CUDA Templates for Linear Algebra Subroutines☆14Updated this week
- No-GIL Python environment featuring NVIDIA Deep Learning libraries.☆43Updated last week
- Worked example of the process from Python source to CUDA kernel execution with Numba☆37Updated 5 months ago
- Learning about CUDA by writing PTX code.☆35Updated 11 months ago
- ML/DL Math and Method notes☆58Updated last year
- A parallel framework for training deep neural networks☆52Updated 3 weeks ago
- Sample Code for “Sequential and Parallel Algorithms and Data Structures -- The Basic Toolbox” Book☆25Updated 7 years ago
- ☆25Updated last month
- NVIDIA's launch, startup, and logging scripts used by our MLPerf Training and HPC submissions☆24Updated this week
- End to End steps for adding custom ops in PyTorch.☆20Updated 4 years ago
- A repository of PyTorch example☆10Updated last year
- Source-to-Source Debuggable Derivatives in Pure Python☆15Updated last year
- Samples demonstrating how to use the Compute Sanitizer Tools and Public API☆75Updated last year
- ☆32Updated 4 years ago