aws-neuron / nki-samplesLinks

☆40

Alternatives and similar repositories for nki-samples

Users that are interested in nki-samples are comparing it to the libraries listed below

Sorting:

aws-neuron / neuronx-distributed
☆60Updated last week
aws-neuron / nki-llama
Project showing how to develop NKI kernels for Llama 3.2 1B inference
☆19Updated 2 months ago
awslabs / slapo
A schedule language for large model training
☆149Updated last year
cchan / tccl
extensible collectives library in triton
☆88Updated 4 months ago
aws-neuron / neuronx-nemo-megatron
☆39Updated 7 months ago
foundation-model-stack / foundation-model-stack
🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.
☆207Updated this week
pytorch / helion
A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.
☆212Updated this week
Deep-Learning-Profiling-Tools / triton-viz
☆227Updated this week
triton-lang / kernels
☆85Updated 9 months ago
gpu-mode / triton-index
Cataloging released Triton kernels.
☆247Updated 6 months ago
awslabs / nki-autotune
☆14Updated this week
AlibabaResearch / flash-llm
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity
☆216Updated last year
zinccat / Awesome-Triton-Kernels
Collection of kernels written in Triton language
☆142Updated 4 months ago
NVIDIA / nvidia-resiliency-ext
NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the …
☆196Updated last week
aws-neuron / transformers-neuronx
☆112Updated 6 months ago
DachengLi1 / AMP
(NeurIPS 2022) Automatically finding good model-parallel strategies, especially for complex models and clusters.
☆40Updated 2 years ago
ScalingIntelligence / KernelBench
KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problems
☆505Updated last week
awslabs / ratex
☆23Updated 8 months ago
tgale96 / grouped_gemm
PyTorch bindings for CUTLASS grouped GEMM.
☆107Updated 2 months ago
octoml / octoml-profile
Home for OctoML PyTorch Profiler
☆113Updated 2 years ago
aws-neuron / aws-neuron-samples
Example code for AWS Neuron SDK developers building inference and training applications
☆148Updated this week
pytorch-labs / tritonbench
Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.
☆199Updated this week
pytorch-labs / applied-ai
Applied AI experiments and examples for PyTorch
☆289Updated 2 months ago
intel / torch-ccl
oneCCL Bindings for Pytorch*
☆99Updated this week
yanring / Megatron-MoE-ModelZoo
Best practices for testing advanced Mixtral, DeepSeek, and Qwen series MoE models using Megatron Core MoE.
☆45Updated last week
stanford-futuredata / stk
☆107Updated 11 months ago
openxla / shardy
MLIR-based partitioning system
☆115Updated this week
mobiusml / gemlite
Fast low-bit matmul kernels in Triton
☆339Updated this week
hao-ai-lab / MuxServe
☆67Updated last year
ppl-ai / pplx-kernels
Perplexity GPU Kernels
☆418Updated 3 weeks ago