awslabs / nki-autotuneLinks
☆15Updated this week
Alternatives and similar repositories for nki-autotune
Users that are interested in nki-autotune are comparing it to the libraries listed below
Sorting:
- ☆60Updated last month
- ☆51Updated 3 weeks ago
- Project showing how to develop NKI kernels for Llama 3.2 1B inference☆19Updated 4 months ago
- Powering AWS purpose-built machine learning chips. Blazing fast and cost effective, natively integrated into PyTorch and TensorFlow and i…☆548Updated this week
- ☆21Updated last month
- Example code for AWS Neuron SDK developers building inference and training applications☆149Updated last week
- ☆39Updated 10 months ago
- Collection of best practices, reference architectures, model training examples and utilities to train large models on AWS.☆355Updated last week
- ☆14Updated last year
- ☆110Updated 9 months ago
- KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problems☆612Updated 2 weeks ago
- ☆12Updated 4 months ago
- ☆13Updated last month
- ☆178Updated last year
- NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the …☆226Updated last week
- Perplexity GPU Kernels☆497Updated last month
- ☆56Updated last month
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆215Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆21Updated this week
- ☆542Updated last year
- ☆10Updated 2 years ago
- Easy, fast and very cheap training and inference on AWS Trainium and Inferentia chips.☆245Updated this week
- A Quirky Assortment of CuTe Kernels☆637Updated 2 weeks ago
- A CLI tool that helps manage training jobs on the SageMaker HyperPod clusters orchestrated by Amazon EKS☆32Updated last week
- ☆242Updated this week
- Best practices for training DeepSeek, Mixtral, Qwen and other MoE models using Megatron Core.☆114Updated last week
- ☆121Updated this week
- Building blocks for foundation models.☆566Updated last year
- ☆14Updated 11 months ago
- A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.☆491Updated this week