aws-neuron / nki-samples
☆26Updated last week
Alternatives and similar repositories for nki-samples:
Users that are interested in nki-samples are comparing it to the libraries listed below
- ☆11Updated this week
- ☆51Updated this week
- ☆34Updated last month
- ☆23Updated 2 months ago
- A schedule language for large model training☆144Updated 8 months ago
- ☆23Updated 2 months ago
- extensible collectives library in triton☆82Updated 4 months ago
- ☆23Updated 10 months ago
- ☆67Updated 3 months ago
- ☆50Updated 8 months ago
- 🏙 Interactive performance profiling and debugging tool for PyTorch neural networks.☆58Updated 3 weeks ago
- Example code for AWS Neuron SDK developers building inference and training applications☆135Updated last week
- Home for OctoML PyTorch Profiler☆107Updated last year
- ☆14Updated 3 years ago
- Cavs: An Efficient Runtime System for Dynamic Neural Networks☆14Updated 4 years ago
- EFA/NCCL base AMI build Packer and CodeBuild/Pipeline files. Also base Docker build files to enable EFA/NCCL in containers☆42Updated last year
- ☆44Updated last year
- Distributed preprocessing and data loading for language datasets☆39Updated 10 months ago
- Supplemental materials for The ASPLOS 2025 / EuroSys 2025 Contest on Intra-Operator Parallelism for Distributed Deep Learning☆23Updated 2 months ago
- ☆102Updated last month
- ☆99Updated 5 months ago
- ☆24Updated last month
- KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problems☆166Updated last week
- ☆14Updated last year
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆187Updated this week
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆38Updated 9 months ago
- The (open-source part of) code to reproduce "BPPSA: Scaling Back-propagation by Parallel Scan Algorithm".☆13Updated 3 years ago
- A source-to-source compiler for optimizing CUDA dynamic parallelism by aggregating launches☆15Updated 5 years ago