google / ruy
☆303Updated last week
Related projects ⓘ
Alternatives and complementary repositories for ruy
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆124Updated last year
- Conversion to/from half-precision floating point formats☆333Updated 3 months ago
- ☆399Updated this week
- AMD's graph optimization engine.☆186Updated this week
- Conversions to MLIR EmitC☆124Updated 2 months ago
- Stretching GPU performance for GEMMs and tensor contractions.☆223Updated this week
- heterogeneity-aware-lowering-and-optimization☆253Updated 10 months ago
- portDNN is a library implementing neural network algorithms written using SYCL☆108Updated 6 months ago
- Representation and Reference Lowering of ONNX Models in MLIR Compiler Infrastructure☆767Updated this week
- Portable (POSIX/Windows/Emscripten) thread pool for C/C++☆353Updated 5 months ago
- Backward compatible ML compute opset inspired by HLO/MHLO☆412Updated this week
- Library for specialized dense and sparse matrix operations, and deep learning primitives.☆850Updated this week
- Intercept Layer for Debugging and Analyzing OpenCL Applications☆314Updated 2 weeks ago
- Intel® Extension for MLIR. A staging ground for MLIR dialects and tools for Intel devices using the MLIR toolchain.☆124Updated this week
- Assembler for NVIDIA Volta and Turing GPUs☆201Updated 2 years ago
- A profiler to disclose and quantify hardware features on GPUs.☆162Updated 2 years ago
- A performant and modular runtime for TensorFlow☆756Updated last month
- A single-header C++ library for simplifying the use of CUDA Runtime Compilation (NVRTC).☆518Updated 5 months ago
- Stores documents and resources used by the OpenXLA developer community☆107Updated 3 months ago
- Symbolic Expression and Statement Module for new DSLs☆206Updated 4 years ago
- CUDA Kernel Benchmarking Library☆519Updated this week
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆313Updated this week
- Agenium Scale vectorization library for CPUs and GPUs☆328Updated 3 years ago
- TPP experimentation on MLIR for linear algebra☆110Updated last month
- ☆128Updated this week
- Shared Middle-Layer for Triton Compilation☆191Updated this week
- how to design cpu gemm on x86 with avx256, that can beat openblas.☆66Updated 5 years ago
- Tests and benchmarks for cudnn (and in the future, other nvidia libraries)☆53Updated 4 years ago
- An implementation of BLAS using the SYCL open standard.☆259Updated 2 weeks ago
- ☆196Updated last year