groq / mlagility
Machine Learning Agility (MLAgility) benchmark and benchmarking tools
☆39Updated 2 months ago
Alternatives and similar repositories for mlagility
Users that are interested in mlagility are comparing it to the libraries listed below
Sorting:
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆40Updated last month
- Intel Gaudi's Megatron DeepSpeed Large Language Models for training☆13Updated 4 months ago
- Repository for CPU Kernel Generation for LLM Inference☆26Updated last year
- python package of rocm-smi-lib☆20Updated 7 months ago
- GroqFlow provides an automated tool flow for compiling machine learning and linear algebra workloads into Groq programs and executing tho…☆109Updated 2 months ago
- Test suite for probing the numerical behavior of NVIDIA tensor cores☆38Updated 9 months ago
- ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization☆106Updated 6 months ago
- Memory Optimizations for Deep Learning (ICML 2023)☆64Updated last year
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆86Updated this week
- 🏙 Interactive performance profiling and debugging tool for PyTorch neural networks.☆61Updated 3 months ago
- A curated list for Efficient Large Language Models☆11Updated last year
- Cray-LM unified training and inference stack.☆22Updated 3 months ago
- ☆68Updated last month
- Write a fast kernel and run it on Discord. See how you compare against the best!☆44Updated last week
- QuIP quantization☆52Updated last year
- High-Performance SGEMM on CUDA devices☆91Updated 3 months ago
- Prototype routines for GPU quantization written using PyTorch.☆21Updated 3 months ago
- Explore training for quantized models☆18Updated 4 months ago
- Flexible simulator for mixed precision and format simulation of LLMs and vision transformers.☆49Updated last year
- Open Source Projects from Pallas Lab☆20Updated 3 years ago
- PB-LLM: Partially Binarized Large Language Models☆152Updated last year
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.☆36Updated last year
- FlexAttention w/ FlashAttention3 Support☆26Updated 7 months ago
- ☆32Updated last year
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆41Updated last year
- Self-host LLMs with LMDeploy and BentoML☆18Updated last month
- ☆52Updated 2 weeks ago
- LLM training in simple, raw C/CUDA☆94Updated last year
- E2E AutoML Model Compression Package☆46Updated 2 months ago
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆61Updated 2 months ago