groq / mlagilityLinks

Machine Learning Agility (MLAgility) benchmark and benchmarking tools

☆39

Alternatives and similar repositories for mlagility

Users that are interested in mlagility are comparing it to the libraries listed below

Sorting:

pytorch-labs / triton-cpu
An experimental CPU backend for Triton (https//github.com/openai/triton)
☆43Updated 4 months ago
intel / intel-extension-for-deepspeed
Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…
☆61Updated last month
HabanaAI / Megatron-DeepSpeed
Intel Gaudi's Megatron DeepSpeed Large Language Models for training
☆13Updated 7 months ago
ROCm / pyrsmi
python package of rocm-smi-lib
☆22Updated 2 weeks ago
CentML / DeepView.Profile
🏙 Interactive performance profiling and debugging tool for PyTorch neural networks.
☆64Updated 6 months ago
deepspeedai / DeepSpeed-Kernels
☆74Updated 4 months ago
EmbeddedLLM / vllm
vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs
☆87Updated this week
gpu-mode / discord-cluster-manager
Write a fast kernel and run it on Discord. See how you compare against the best!
☆48Updated this week
spcl / daceml
A Data-Centric Compiler for Machine Learning
☆84Updated last year
salykova / sgemm.cu
High-Performance SGEMM on CUDA devices
☆98Updated 6 months ago
north-numerical-computing / tensor-cores-numerical-behavior
Test suite for probing the numerical behavior of NVIDIA tensor cores
☆40Updated last year
onnx / steering-committee
Notes and artifacts from the ONNX steering committee
☆26Updated this week
facebookresearch / FAMBench
Benchmarks to capture important workloads.
☆31Updated 6 months ago
octoml / octoml-profile
Home for OctoML PyTorch Profiler
☆113Updated 2 years ago
Jokeren / triton-samples
☆28Updated 6 months ago
facebookresearch / MODel_opt
Memory Optimizations for Deep Learning (ICML 2023)
☆102Updated last year
tonyzhang617 / nomad-dist
☆37Updated last year
FrancescoSaverioZuppichini / pytorch-2.0-benchmark
Benchmarking PyTorch 2.0 different models
☆20Updated 2 years ago
graphcore / tutorials
Training material for IPU users: tutorials, feature examples, simple applications
☆86Updated 2 years ago
pytorch / tlparse
TORCH_LOGS parser for PT2
☆47Updated this week
gevtushenko / llm.c
LLM training in simple, raw C/CUDA
☆102Updated last year
mlc-ai / llm-perf-bench
☆120Updated last year
axonn-ai / axonn
A parallel framework for training deep neural networks
☆63Updated 4 months ago
openxla / shardy
MLIR-based partitioning system
☆115Updated this week
moritztng / grayskull-attention
Attention in SRAM on Tenstorrent Grayskull
☆37Updated last year
mlcommons / logging
MLPerf™ logging library
☆37Updated this week
intel / torch-ccl
oneCCL Bindings for Pytorch*
☆99Updated 3 weeks ago
fw-ai / llama-cuda-graph-example
Example of applying CUDA graphs to LLaMA-v2
☆12Updated last year
ROCm / TransformerEngine
☆41Updated this week
NVIDIA / compute-eval
Evaluating Large Language Models for CUDA Code Generation ComputeEval is a framework designed to generate and evaluate CUDA code from Lar…
☆57Updated last month