aws-neuron / nki-libraryLinks
☆42Updated 2 weeks ago
Alternatives and similar repositories for nki-library
Users that are interested in nki-library are comparing it to the libraries listed below
Sorting:
- ☆15Updated 3 months ago
- ☆13Updated last year
- A Top-Down Profiler for GPU Applications☆22Updated last year
- Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.☆32Updated 10 months ago
- ☆13Updated 8 years ago
- A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.☆27Updated last year
- Official page for 18-847C (Spring '22): Data Center Computing☆15Updated 3 years ago
- A Triton-only attention backend for vLLM☆23Updated this week
- ☆53Updated 9 months ago
- ☆13Updated last year
- Benchmark tests supporting the TiledCUDA library.☆18Updated last year
- ☆23Updated 6 months ago
- cuJSON: A Highly Parallel JSON Parser for GPUs☆38Updated last month
- Slides and exercises for persistent memory programming tutorial☆14Updated 3 years ago
- An experimental communicating attention kernel based on DeepEP.☆35Updated 6 months ago
- ☆13Updated 2 years ago
- Tutorials for NVIDIA CUPTI samples☆50Updated 3 months ago
- Handwritten GEMM using Intel AMX (Advanced Matrix Extension)☆17Updated last year
- A source-to-source compiler for optimizing CUDA dynamic parallelism by aggregating launches☆15Updated 6 years ago
- ☆31Updated 3 years ago
- ☆14Updated last year
- Supplemental materials for The ASPLOS 2025 / EuroSys 2025 Contest on Intra-Operator Parallelism for Distributed Deep Learning☆25Updated 8 months ago
- Parallel framework for training and fine-tuning deep neural networks☆70Updated 2 months ago
- Sample Codes using NVSHMEM on Multi-GPU☆30Updated 3 years ago
- Wave: Python Domain-Specific Language for High Performance Machine Learning☆42Updated this week
- PyTorch compilation tutorial covering TorchScript, torch.fx, and Slapo☆17Updated 2 years ago
- ☆77Updated last year
- Simple python library for generating your own perfetto traces for your application. Can be used for both app instrumentation and custom …☆24Updated 7 months ago
- ☆25Updated 2 months ago
- An Attention Superoptimizer☆22Updated last year