vaibhawvipul / performance-engineeringLinks

☆28

Alternatives and similar repositories for performance-engineering

Users that are interested in performance-engineering are comparing it to the libraries listed below

Sorting:

gevtushenko / llm.c
LLM training in simple, raw C/CUDA
☆101Updated last year
Jokeren / triton-samples
☆28Updated 6 months ago
stas00 / ml-ways
ML/DL Math and Method notes
☆61Updated last year
gpu-mode / discord-cluster-manager
Write a fast kernel and run it on Discord. See how you compare against the best!
☆47Updated this week
UmerHA / triton_util
Make triton easier
☆47Updated last year
nod-ai / transformer-benchmarks
benchmarking some transformer deployments
☆26Updated 2 years ago
groq / mlagility
Machine Learning Agility (MLAgility) benchmark and benchmarking tools
☆39Updated 2 months ago
msaroufim / mynotes
☆18Updated 2 weeks ago
amirabbasasadi / RockyML
⛰️ RockyML - A High-Performance Scientific Computing Framework for Non-smooth Machine Learning Problems
☆19Updated 2 years ago
fabiocannizzo / FastBinarySearch
Fast and vectorizable algorithms for searching in a vector of sorted floating point numbers
☆144Updated 7 months ago
salykova / sgemm.cu
High-Performance SGEMM on CUDA devices
☆97Updated 6 months ago
axonn-ai / axonn
A parallel framework for training deep neural networks
☆63Updated 4 months ago
alexzhang13 / Triton-Puzzles-Solutions
Personal solutions to the Triton Puzzles
☆19Updated last year
drkennetz / cuda_examples
Some CUDA example code with READMEs.
☆169Updated 4 months ago
NVIDIA / numba-cuda
The CUDA target for Numba
☆153Updated this week
facebookresearch / FAMBench
Benchmarks to capture important workloads.
☆31Updated 5 months ago
NVIDIA / free-threaded-python
No-GIL Python environment featuring NVIDIA Deep Learning libraries.
☆63Updated 3 months ago
nunoplopes / torchy
A tracing JIT compiler for PyTorch
☆13Updated 3 years ago
mlops-discord / talks
Slides and recordings of talks hosted by our community
☆20Updated last year
GindaChen / FlexFlashAttention3
FlexAttention w/ FlashAttention3 Support
☆26Updated 9 months ago
quic / efficient-transformers
This library empowers users to seamlessly port pretrained models and checkpoints on the HuggingFace (HF) hub (developed using HF transfor…
☆74Updated this week
openhackathons-org / AI-Profiler
This material contains content on how to profile and optimize simple Pytorch mnist code using NVIDIA Nsight Systems and Pytorch Profiler
☆14Updated 2 years ago
pytorch-labs / triton-cpu
An experimental CPU backend for Triton (https//github.com/openai/triton)
☆43Updated 4 months ago
BobMcDear / neural-network-cuda
Neural network from scratch in CUDA/C++
☆82Updated 6 months ago
epoch-research / Compute-Trends
Supplementary material for our paper "Compute Trends Across Three Eras of Machine Learning".
☆40Updated 3 years ago
nod-ai / PI
A lightweight MLIR Python frontend with support for PyTorch
☆25Updated 10 months ago
facebookresearch / loop_nest
Loop Nest - Linear algebra compiler and code generator.
☆22Updated 2 years ago
kanpuriyanawab / minbpe.c
a Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization in pure C.
☆21Updated last year
TiledTensor / TiledBench
Benchmark tests supporting the TiledCUDA library.
☆16Updated 8 months ago
aws-neuron / aws-neuron-reference-for-megatron-lm
☆14Updated last year