groq / mlagilityLinks
Machine Learning Agility (MLAgility) benchmark and benchmarking tools
☆40Updated 4 months ago
Alternatives and similar repositories for mlagility
Users that are interested in mlagility are comparing it to the libraries listed below
Sorting:
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆63Updated 6 months ago
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆47Updated 4 months ago
- python package of rocm-smi-lib☆24Updated 2 weeks ago
- TORCH_LOGS parser for PT2☆70Updated last month
- Notes and artifacts from the ONNX steering committee☆27Updated last week
- Write a fast kernel and run it on Discord. See how you compare against the best!☆64Updated last week
- Memory Optimizations for Deep Learning (ICML 2023)☆114Updated last year
- A Data-Centric Compiler for Machine Learning☆85Updated 2 weeks ago
- MLPerf™ logging library☆37Updated last week
- 🏙 Interactive performance profiling and debugging tool for PyTorch neural networks.☆64Updated 11 months ago
- High-Performance SGEMM on CUDA devices☆114Updated 11 months ago
- Intel Gaudi's Megatron DeepSpeed Large Language Models for training☆16Updated last year
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆94Updated this week
- ☆12Updated 4 months ago
- Benchmarks to capture important workloads.☆31Updated 10 months ago
- Benchmarking PyTorch 2.0 different models☆20Updated 2 years ago
- ☆71Updated 9 months ago
- ML model training for edge devices☆167Updated 2 years ago
- GroqFlow provides an automated tool flow for compiling machine learning and linear algebra workloads into Groq programs and executing tho…☆114Updated 4 months ago
- Home for OctoML PyTorch Profiler☆114Updated 2 years ago
- LLM training in simple, raw C/CUDA☆108Updated last year
- Efficient in-memory representation for ONNX, in Python☆37Updated this week
- Test suite for probing the numerical behavior of NVIDIA tensor cores☆41Updated last year
- ☆120Updated last year
- Parallel framework for training and fine-tuning deep neural networks☆71Updated last month
- An innovative library for efficient LLM inference via low-bit quantization☆351Updated last year
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆155Updated last year
- ☆219Updated 11 months ago
- Explore training for quantized models☆25Updated 5 months ago
- PB-LLM: Partially Binarized Large Language Models☆157Updated 2 years ago