groq / mlagilityLinks
Machine Learning Agility (MLAgility) benchmark and benchmarking tools
☆39Updated 2 months ago
Alternatives and similar repositories for mlagility
Users that are interested in mlagility are comparing it to the libraries listed below
Sorting:
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆43Updated 4 months ago
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆61Updated last month
- Intel Gaudi's Megatron DeepSpeed Large Language Models for training☆13Updated 7 months ago
- python package of rocm-smi-lib☆22Updated 2 weeks ago
- 🏙 Interactive performance profiling and debugging tool for PyTorch neural networks.☆64Updated 6 months ago
- ☆74Updated 4 months ago
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆87Updated this week
- Write a fast kernel and run it on Discord. See how you compare against the best!☆48Updated this week
- A Data-Centric Compiler for Machine Learning☆84Updated last year
- High-Performance SGEMM on CUDA devices☆98Updated 6 months ago
- Test suite for probing the numerical behavior of NVIDIA tensor cores☆40Updated last year
- Notes and artifacts from the ONNX steering committee☆26Updated this week
- Benchmarks to capture important workloads.☆31Updated 6 months ago
- Home for OctoML PyTorch Profiler☆113Updated 2 years ago
- ☆28Updated 6 months ago
- Memory Optimizations for Deep Learning (ICML 2023)☆102Updated last year
- ☆37Updated last year
- Benchmarking PyTorch 2.0 different models☆20Updated 2 years ago
- Training material for IPU users: tutorials, feature examples, simple applications☆86Updated 2 years ago
- TORCH_LOGS parser for PT2☆47Updated this week
- LLM training in simple, raw C/CUDA☆102Updated last year
- ☆120Updated last year
- A parallel framework for training deep neural networks☆63Updated 4 months ago
- MLIR-based partitioning system☆115Updated this week
- Attention in SRAM on Tenstorrent Grayskull☆37Updated last year
- MLPerf™ logging library☆37Updated this week
- oneCCL Bindings for Pytorch*☆99Updated 3 weeks ago
- Example of applying CUDA graphs to LLaMA-v2☆12Updated last year
- ☆41Updated this week
- Evaluating Large Language Models for CUDA Code Generation ComputeEval is a framework designed to generate and evaluate CUDA code from Lar…☆57Updated last month