groq / mlagility
Machine Learning Agility (MLAgility) benchmark and benchmarking tools
β38Updated 3 weeks ago
Alternatives and similar repositories for mlagility:
Users that are interested in mlagility are comparing it to the libraries listed below
- An experimental CPU backend for Triton (https//github.com/openai/triton)β40Updated last week
- π Interactive performance profiling and debugging tool for PyTorch neural networks.β59Updated 2 months ago
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.β35Updated 11 months ago
- High-Performance SGEMM on CUDA devicesβ87Updated 2 months ago
- GroqFlow provides an automated tool flow for compiling machine learning and linear algebra workloads into Groq programs and executing thoβ¦β108Updated 3 weeks ago
- Attention in SRAM on Tenstorrent Grayskullβ32Updated 8 months ago
- Open Source Projects from Pallas Labβ20Updated 3 years ago
- Example of applying CUDA graphs to LLaMA-v2β12Updated last year
- β25Updated this week
- Notes and artifacts from the ONNX steering committeeβ25Updated this week
- β63Updated this week
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMsβ87Updated this week
- TORCH_LOGS parser for PT2β36Updated this week
- β25Updated last year
- A safetensors extension to efficiently store sparse quantized tensors on diskβ91Updated this week
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundryβ40Updated last year
- ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterizationβ104Updated 5 months ago
- Framework to reduce autotune overhead to zero for well known deployments.β63Updated last week
- Write a fast kernel and run it on Discord. See how you compare against the best!β35Updated this week
- FlexAttention w/ FlashAttention3 Supportβ26Updated 5 months ago
- LLM training in simple, raw C/CUDAβ92Updated 10 months ago
- PB-LLM: Partially Binarized Large Language Modelsβ152Updated last year
- PTX-Tutorial Written Purely By AIs (Deep Research of Openai and Claude 3.7)β60Updated last week
- QuIP quantizationβ52Updated last year
- β24Updated 6 months ago
- Repository for CPU Kernel Generation for LLM Inferenceβ25Updated last year
- Open deep learning compiler stack for cpu, gpu and specialized acceleratorsβ18Updated last week
- β13Updated 3 weeks ago
- Explore training for quantized modelsβ17Updated 2 months ago
- Intel Gaudi's Megatron DeepSpeed Large Language Models for trainingβ13Updated 3 months ago