mk1-project / quickreduce
☆22Updated last week
Alternatives and similar repositories for quickreduce:
Users that are interested in quickreduce are comparing it to the libraries listed below
- ☆193Updated 8 months ago
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆314Updated this week
- Applied AI experiments and examples for PyTorch☆250Updated last week
- ☆57Updated 3 months ago
- ☆192Updated this week
- Fast low-bit matmul kernels in Triton☆272Updated this week
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.☆102Updated 8 months ago
- ☆90Updated 2 weeks ago
- Fastest kernels written from scratch☆202Updated 3 weeks ago
- NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the …☆131Updated last week
- ☆73Updated 4 months ago
- CUDA Matrix Multiplication Optimization☆177Updated 8 months ago
- extensible collectives library in triton☆84Updated 6 months ago
- Experimental projects related to TensorRT☆94Updated this week
- ☆92Updated 11 months ago
- A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores☆50Updated last year
- An experimental CPU backend for Triton☆101Updated this week
- Fast Matrix Multiplications for Lookup Table-Quantized LLMs☆235Updated last month
- KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problems☆239Updated this week
- An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).☆242Updated 5 months ago
- A library to analyze PyTorch traces.☆354Updated this week
- Cataloging released Triton kernels.☆212Updated 2 months ago
- Shared Middle-Layer for Triton Compilation☆233Updated 2 weeks ago
- A library of GPU kernels for sparse matrix operations.☆260Updated 4 years ago
- OpenAI Triton backend for Intel® GPUs☆170Updated this week
- Step-by-step optimization of CUDA SGEMM☆294Updated 2 years ago
- PyTorch emulation library for Microscaling (MX)-compatible data formats☆212Updated 6 months ago
- Development repository for the Triton language and compiler☆114Updated this week
- Multi-GPU communication profiler and visualizer☆27Updated 9 months ago
- Assembler for NVIDIA Volta and Turing GPUs☆214Updated 3 years ago