chips-compilers-mlsys-21 / chips-compilers-mlsys-21.github.ioLinks
☆11Updated 4 years ago
Alternatives and similar repositories for chips-compilers-mlsys-21.github.io
Users that are interested in chips-compilers-mlsys-21.github.io are comparing it to the libraries listed below
Sorting:
- An external memory allocator example for PyTorch.☆16Updated 5 months ago
- ☆20Updated last year
- FractalTensor is a programming framework that introduces a novel approach to organizing data in deep neural networks (DNNs) as a list of …☆32Updated last year
- GPTQ inference TVM kernel☆40Updated last year
- A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer☆96Updated 4 months ago
- MSLK (Meta Superintelligence Labs Kernels) is a collection of PyTorch GPU operator libraries that are designed and optimized for GenAI tr…☆45Updated this week
- An Attention Superoptimizer☆22Updated last year
- study of cutlass☆22Updated last year
- ☆42Updated 2 years ago
- Benchmark scripts for TVM☆74Updated 3 years ago
- A Triton JIT runtime and ffi provider in C++☆31Updated last week
- Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS☆34Updated 11 months ago
- This is a demo how to write a high performance convolution run on apple silicon☆57Updated 3 years ago
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆106Updated 7 months ago
- Standalone Flash Attention v2 kernel without libtorch dependency☆113Updated last year
- An extention of TVMScript to write simple and high performance GPU kernels with tensorcore.☆51Updated last year
- Fairring (FAIR + Herring) is a plug-in for PyTorch that provides a process group for distributed training that outperforms NCCL at large …☆65Updated 3 years ago
- PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections☆124Updated 3 years ago
- Benchmark PyTorch Custom Operators☆14Updated 2 years ago
- Official resporitory for "IPDPS' 24 QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices".☆20Updated last year
- play gemm with tvm☆92Updated 2 years ago
- study of Ampere' Sparse Matmul☆18Updated 5 years ago
- ☆15Updated 3 years ago
- TiledLower is a Dataflow Analysis and Codegen Framework written in Rust.☆14Updated last year
- System for automated integration of deep learning backends.☆47Updated 3 years ago
- DeeperGEMM: crazy optimized version☆73Updated 9 months ago
- DietCode Code Release☆65Updated 3 years ago
- Benchmark tests supporting the TiledCUDA library.☆18Updated last year
- We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel …☆192Updated last year
- Canvas: End-to-End Kernel Architecture Search in Neural Networks☆27Updated last year