aws-neuron / nki-samples
☆27Updated 2 weeks ago
Alternatives and similar repositories for nki-samples:
Users that are interested in nki-samples are comparing it to the libraries listed below
- ☆11Updated this week
- ☆51Updated last week
- ☆34Updated 2 months ago
- ☆67Updated 3 months ago
- extensible collectives library in triton☆83Updated 4 months ago
- ☆23Updated 10 months ago
- ☆23Updated 2 months ago
- ☆23Updated 2 months ago
- A schedule language for large model training☆144Updated 8 months ago
- MLIR-based partitioning system☆62Updated this week
- Example code for AWS Neuron SDK developers building inference and training applications☆135Updated last week
- Home for OctoML PyTorch Profiler☆107Updated last year
- TORCH_LOGS parser for PT2☆32Updated this week
- EFA/NCCL base AMI build Packer and CodeBuild/Pipeline files. Also base Docker build files to enable EFA/NCCL in containers☆42Updated last year
- ☆14Updated 3 years ago
- ☆25Updated last month
- TileFusion is a highly efficient kernel template library designed to elevate the level of abstraction in CUDA C for processing tiles.☆55Updated this week
- Framework to reduce autotune overhead to zero for well known deployments.☆61Updated 3 weeks ago
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆89Updated this week
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆38Updated 9 months ago
- ☆44Updated last year
- ☆14Updated last year
- ☆102Updated last month
- ☆59Updated 2 weeks ago
- ☆43Updated last month
- KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problems☆172Updated last week
- Hydragen: High-Throughput LLM Inference with Shared Prefixes☆34Updated 9 months ago
- ☆70Updated 2 months ago
- A sandbox for quick iteration and experimentation on projects related to IREE, MLIR, and LLVM☆56Updated last week
- Collection of kernels written in Triton language☆103Updated this week