groq / mlagility
Machine Learning Agility (MLAgility) benchmark and benchmarking tools
☆38Updated last week
Related projects ⓘ
Alternatives and complementary repositories for mlagility
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆34Updated 5 months ago
- Intel Gaudi's Megatron DeepSpeed Large Language Models for training☆13Updated 3 weeks ago
- AMD related optimizations for transformer models☆57Updated this week
- python package of rocm-smi-lib☆18Updated last month
- A safetensors extension to efficiently store sparse quantized tensors on disk☆46Updated this week
- ☆55Updated 5 months ago
- GroqFlow provides an automated tool flow for compiling machine learning and linear algebra workloads into Groq programs and executing tho…☆99Updated last week
- Attention in SRAM on Tenstorrent Grayskull☆29Updated 3 months ago
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆88Updated this week
- ☆39Updated last month
- Prototype routines for GPU quantization written using PyTorch.☆19Updated last week
- Collection of kernels written in Triton language☆63Updated last week
- Simple and fast low-bit matmul kernels in CUDA / Triton☆133Updated this week
- Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)☆152Updated this week
- FlexAttention w/ FlashAttention3 Support☆26Updated last month
- PB-LLM: Partially Binarized Large Language Models☆146Updated 11 months ago
- Applied AI experiments and examples for PyTorch☆159Updated last week
- ☆43Updated this week
- ☆23Updated 2 months ago
- Fast Inference of MoE Models with CPU-GPU Orchestration☆170Updated last week
- ring-attention experiments☆95Updated 3 weeks ago
- MLPerf™ logging library☆30Updated last week
- LLM training in simple, raw C/CUDA☆86Updated 6 months ago
- Example of applying CUDA graphs to LLaMA-v2☆10Updated last year
- 🏙 Interactive performance profiling and debugging tool for PyTorch neural networks.☆53Updated this week
- This library empowers users to seamlessly port pretrained models and checkpoints on the HuggingFace (HF) hub (developed using HF transfor…☆47Updated this week
- ☆140Updated this week
- Notes and artifacts from the ONNX steering committee☆25Updated last week
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆35Updated 3 months ago
- An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).☆196Updated last week