aws-neuron / nki-samplesLinks
☆37Updated 3 weeks ago
Alternatives and similar repositories for nki-samples
Users that are interested in nki-samples are comparing it to the libraries listed below
Sorting:
- Project showing how to develop NKI kernels for Llama 3.2 1B inference☆14Updated 3 weeks ago
- ☆58Updated last month
- ☆14Updated last week
- ☆38Updated 6 months ago
- A schedule language for large model training☆149Updated last year
- Example code for AWS Neuron SDK developers building inference and training applications☆149Updated 2 weeks ago
- extensible collectives library in triton☆86Updated 2 months ago
- ☆27Updated 6 months ago
- Distributed preprocessing and data loading for language datasets☆39Updated last year
- ☆43Updated last year
- Easy, fast and very cheap training and inference on AWS Trainium and Inferentia chips.☆230Updated this week
- ☆28Updated 5 months ago
- ☆23Updated 7 months ago
- A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.☆166Updated this week
- ☆81Updated 7 months ago
- NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the …☆179Updated 2 weeks ago
- ☆24Updated last year
- ☆110Updated 5 months ago
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆204Updated this week
- MLIR-based partitioning system☆97Updated this week
- Collection of kernels written in Triton language☆132Updated 2 months ago
- ☆105Updated 10 months ago
- A resilient distributed training framework☆95Updated last year
- This is a plugin which lets EC2 developers use libfabric as network provider while running NCCL applications.☆176Updated this week
- Evaluating Large Language Models for CUDA Code Generation ComputeEval is a framework designed to generate and evaluate CUDA code from Lar…☆50Updated last week
- 🏙 Interactive performance profiling and debugging tool for PyTorch neural networks.☆61Updated 5 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆18Updated this week
- TritonParse is a tool designed to help developers analyze and debug Triton kernels by visualizing the compilation process and source code…☆93Updated last week
- (NeurIPS 2022) Automatically finding good model-parallel strategies, especially for complex models and clusters.☆39Updated 2 years ago
- Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity☆214Updated last year