aws-neuron / nki-samplesLinks
☆56Updated 3 weeks ago
Alternatives and similar repositories for nki-samples
Users that are interested in nki-samples are comparing it to the libraries listed below
Sorting:
- ☆63Updated 2 weeks ago
- Project showing how to develop NKI kernels for Llama 3.2 1B inference☆19Updated 6 months ago
- extensible collectives library in triton☆91Updated 8 months ago
- This repository contains companion software for the Colfax Research paper "Categorical Foundations for CuTe Layouts".☆81Updated 2 months ago
- A schedule language for large model training☆151Updated 3 months ago
- Ship correct and fast LLM kernels to PyTorch☆124Updated 3 weeks ago
- ☆94Updated last year
- ☆256Updated last week
- ☆15Updated last week
- ☆39Updated 11 months ago
- Collection of kernels written in Triton language☆172Updated 8 months ago
- Applied AI experiments and examples for PyTorch☆308Updated 3 months ago
- TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators☆98Updated 5 months ago
- Autonomous GPU Kernel Generation via Deep Agents☆172Updated this week
- ☆28Updated 10 months ago
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆217Updated last week
- Cataloging released Triton kernels.☆274Updated 2 months ago
- MLIR-based partitioning system☆151Updated this week
- PyTorch bindings for CUTLASS grouped GEMM.☆132Updated 6 months ago
- AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming☆119Updated last week
- Example code for AWS Neuron SDK developers building inference and training applications☆151Updated 3 weeks ago
- TPU inference for vLLM, with unified JAX and PyTorch support.☆170Updated last week
- Github mirror of trition-lang/triton repo.☆100Updated this week
- ☆148Updated 11 months ago
- ☆113Updated last year
- Building the Virtuous Cycle for AI-driven LLM Systems☆93Updated this week
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆151Updated 2 years ago
- Best practices for training DeepSeek, Mixtral, Qwen and other MoE models using Megatron Core.☆130Updated 3 weeks ago
- TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels☆175Updated 2 weeks ago
- Home for OctoML PyTorch Profiler☆114Updated 2 years ago