groq / mlagilityLinks
Machine Learning Agility (MLAgility) benchmark and benchmarking tools
☆39Updated last month
Alternatives and similar repositories for mlagility
Users that are interested in mlagility are comparing it to the libraries listed below
Sorting:
- Intel Gaudi's Megatron DeepSpeed Large Language Models for training☆13Updated 9 months ago
- python package of rocm-smi-lib☆23Updated 2 months ago
- ☆74Updated 5 months ago
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆89Updated this week
- 🏙 Interactive performance profiling and debugging tool for PyTorch neural networks.☆64Updated 7 months ago
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆45Updated last month
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆62Updated 2 months ago
- Example ML projects that use the Determined library.☆32Updated last year
- A Data-Centric Compiler for Machine Learning☆84Updated last year
- LLM-Inference-Bench☆51Updated 2 months ago
- Test suite for probing the numerical behavior of NVIDIA tensor cores☆41Updated last year
- A parallel framework for training deep neural networks☆63Updated 6 months ago
- Benchmarking PyTorch 2.0 different models☆20Updated 2 years ago
- ☆92Updated 3 weeks ago
- High-Performance SGEMM on CUDA devices☆101Updated 7 months ago
- Write a fast kernel and run it on Discord. See how you compare against the best!☆57Updated this week
- ☆43Updated last week
- Benchmarks to capture important workloads.☆31Updated 7 months ago
- Memory Optimizations for Deep Learning (ICML 2023)☆107Updated last year
- MLPerf™ logging library☆36Updated last week
- GroqFlow provides an automated tool flow for compiling machine learning and linear algebra workloads into Groq programs and executing tho…☆112Updated last month
- ☆38Updated last year
- TORCH_LOGS parser for PT2☆60Updated this week
- PB-LLM: Partially Binarized Large Language Models☆153Updated last year
- ☆27Updated last year
- Supplementary material for our paper "Compute Trends Across Three Eras of Machine Learning".☆41Updated 3 years ago
- Attention in SRAM on Tenstorrent Grayskull☆38Updated last year
- ML model training for edge devices☆166Updated last year
- LLM training in simple, raw C/CUDA☆104Updated last year
- Framework to reduce autotune overhead to zero for well known deployments.☆82Updated this week