aws-neuron / nki-llamaLinks
Project showing how to develop NKI kernels for Llama 3.2 1B inference
☆19Updated 4 months ago
Alternatives and similar repositories for nki-llama
Users that are interested in nki-llama are comparing it to the libraries listed below
Sorting:
- ☆51Updated 3 weeks ago
- ☆60Updated last month
- ☆15Updated this week
- ☆141Updated 9 months ago
- Perplexity GPU Kernels☆497Updated last month
- A schedule language for large model training☆151Updated 2 months ago
- ☆14Updated last year
- ☆241Updated last year
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆215Updated this week
- KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problems☆612Updated 2 weeks ago
- QuickReduce is a performant all-reduce library designed for AMD ROCm that supports inline compression.☆33Updated last month
- ☆83Updated 2 years ago
- ☆242Updated this week
- Github mirror of trition-lang/triton repo.☆86Updated last week
- ☆27Updated this week
- A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.☆491Updated this week
- NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the …☆226Updated last week
- Applied AI experiments and examples for PyTorch☆299Updated 2 months ago
- ☆150Updated 5 months ago
- ☆53Updated 4 months ago
- Best practices for training DeepSeek, Mixtral, Qwen and other MoE models using Megatron Core.☆114Updated last week
- Fastest kernels written from scratch☆377Updated last month
- A curated list of awesome projects and papers for distributed training or inference☆247Updated last year
- Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity☆221Updated 2 years ago
- Supplemental materials for The ASPLOS 2025 / EuroSys 2025 Contest on Intra-Operator Parallelism for Distributed Deep Learning☆23Updated 5 months ago
- ☆92Updated 11 months ago
- ☆75Updated 4 years ago
- nnScaler: Compiling DNN models for Parallel Training☆117Updated last month
- (NeurIPS 2022) Automatically finding good model-parallel strategies, especially for complex models and clusters.☆41Updated 2 years ago
- ☆23Updated 2 months ago