groq / mlagilityLinks
Machine Learning Agility (MLAgility) benchmark and benchmarking tools
☆40Updated 6 months ago
Alternatives and similar repositories for mlagility
Users that are interested in mlagility are comparing it to the libraries listed below
Sorting:
- Intel Gaudi's Megatron DeepSpeed Large Language Models for training☆18Updated last year
- 🏙 Interactive performance profiling and debugging tool for PyTorch neural networks.☆64Updated last year
- GroqFlow provides an automated tool flow for compiling machine learning and linear algebra workloads into Groq programs and executing tho…☆115Updated 6 months ago
- Write a fast kernel and run it on Discord. See how you compare against the best!☆71Updated this week
- python package of rocm-smi-lib☆24Updated last month
- ☆71Updated 10 months ago
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆49Updated 5 months ago
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆93Updated this week
- Benchmarks to capture important workloads.☆32Updated 2 weeks ago
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆64Updated 7 months ago
- High-Performance FP32 GEMM on CUDA devices☆117Updated last year
- MLPerf™ logging library☆38Updated last month
- Memory Optimizations for Deep Learning (ICML 2023)☆115Updated last year
- Explore training for quantized models☆26Updated 6 months ago
- TORCH_TRACE parser for PT2☆76Updated this week
- LLM training in simple, raw C/CUDA☆112Updated last year
- Home for OctoML PyTorch Profiler☆113Updated 2 years ago
- We aim to redefine Data Parallel libraries portabiliy, performance, programability and maintainability, by using C++ standard features, i…☆47Updated this week
- Tutorials for running models on First-gen Gaudi and Gaudi2 for Training and Inference. The source files for the tutorials on https://dev…☆64Updated 4 months ago
- A Data-Centric Compiler for Machine Learning☆85Updated last month
- Test suite for probing the numerical behavior of NVIDIA tensor cores☆41Updated last year
- Parallel framework for training and fine-tuning deep neural networks☆70Updated 3 months ago
- Repository of model demos using TT-Buda☆63Updated 10 months ago
- ☆38Updated last year
- Efficient in-memory representation for ONNX, in Python☆42Updated this week
- LLM-Inference-Bench☆58Updated 6 months ago
- ☆27Updated 2 years ago
- PB-LLM: Partially Binarized Large Language Models☆157Updated 2 years ago
- Prototype routines for GPU quantization written using PyTorch.☆21Updated 3 weeks ago
- Evaluating Large Language Models for CUDA Code Generation ComputeEval is a framework designed to generate and evaluate CUDA code from Lar…☆96Updated last month