nod-ai / shark-aiLinks

SHARK Inference Modeling and Serving

☆53

Alternatives and similar repositories for shark-ai

Users that are interested in shark-ai are comparing it to the libraries listed below

Sorting:

nod-ai / SHARK-ModelDev
Unified compiler/runtime for interfacing with PyTorch Dynamo.
☆102Updated 2 months ago
intel / intel-xpu-backend-for-triton
OpenAI Triton backend for Intel® GPUs
☆219Updated this week
iree-org / iree-turbine
IREE's PyTorch Frontend, based on Torch Dynamo.
☆99Updated this week
ROCm / rocMLIR
☆157Updated this week
0xD0GF00D / DocumentSASS
Unofficial description of the CUDA assembly (SASS) instruction sets.
☆155Updated 3 months ago
ROCm / rocWMMA
[DEPRECATED] Moved to ROCm/rocm-libraries repo
☆136Updated last week
triton-lang / triton-cpu
An experimental CPU backend for Triton
☆155Updated last week
ROCm / aiter
AI Tensor Engine for ROCm
☆296Updated this week
daadaada / turingas
Assembler for NVIDIA Volta and Turing GPUs
☆231Updated 3 years ago
ROCm / triton
Development repository for the Triton language and compiler
☆136Updated last week
intel / mlir-extensions
Intel® Extension for MLIR. A staging ground for MLIR dialects and tools for Intel devices using the MLIR toolchain.
☆145Updated this week
gpuocelot / gpuocelot
GPUOcelot: A dynamic compilation framework for PTX
☆211Updated 8 months ago
microsoft / triton-shared
Shared Middle-Layer for Triton Compilation
☆302Updated last week
ROCm / iris
AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming
☆101Updated this week
ROCm / amd_matrix_instruction_calculator
A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators
☆119Updated 5 months ago
sunlex0717 / DissectingTensorCores
☆109Updated last year
ROCm / Tensile
[DEPRECATED] Moved to ROCm/rocm-libraries repo
☆254Updated last week
meta-pytorch / triton-cpu
An experimental CPU backend for Triton (https//github.com/openai/triton)
☆47Updated 2 months ago
ROCm / composable_kernel
Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
☆481Updated this week
NVlabs / NVBit
☆288Updated last month
intel / xetla
☆62Updated 10 months ago
microsoft / TileFusion
TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.
☆100Updated 4 months ago
carlushuang / gcnasm
amdgpu example code in hip/asm
☆45Updated this week
mmperf / mmperf
MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.
☆134Updated 2 years ago
tenstorrent / tt-mlir
Tenstorrent MLIR compiler
☆206Updated this week
seb-v / fp32_sgemm_amd
Super fast FP32 matrix multiplication on RDNA3
☆78Updated 7 months ago
nod-ai / iree-amd-aie
IREE plugin repository for the AMD AIE accelerator
☆112Updated this week
libxsmm / tpp-mlir
TPP experimentation on MLIR for linear algebra
☆137Updated last month
salykova / sgemm.cu
High-Performance SGEMM on CUDA devices
☆107Updated 9 months ago
leimao / CUTLASS-Examples
CUTLASS and CuTe Examples
☆98Updated 2 weeks ago