pytorch / FBGEMMLinks

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/

☆1,453

Alternatives and similar repositories for FBGEMM

Users that are interested in FBGEMM are comparing it to the libraries listed below

Sorting:

microsoft / nnfusion
A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.
☆994Updated last year
llvm / torch-mlir
The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.
☆1,652Updated this week
jiazhihao / TASO
The Tensor Algebra SuperOptimizer for Deep Learning
☆730Updated 2 years ago
tensorflow / runtime
A performant and modular runtime for TensorFlow
☆760Updated last month
dmlc / dlpack
common in-memory tensor structure
☆1,080Updated last week
onnx / onnx-mlir
Representation and Reference Lowering of ONNX Models in MLIR Compiler Infrastructure
☆923Updated this week
pytorch / gloo
Collective communications library with various primitives for multi-machine training.
☆1,364Updated this week
alibaba / BladeDISC
BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.
☆899Updated 9 months ago
pytorch / torchdynamo
A Python-level JIT compiler designed to make unmodified PyTorch programs faster.
☆1,064Updated last year
tensorflow / mlir-hlo
☆422Updated last week
d2l-ai / d2l-tvm
Dive into Deep Learning Compiler
☆646Updated 3 years ago
pytorch / kineto
A CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters.
☆881Updated this week
mlcommons / inference
Reference implementations of MLPerf™ inference benchmarks
☆1,475Updated this week
google / gemmlowp
Low-precision matrix multiplication
☆1,817Updated last year
pytorch / benchmark
TorchBench is a collection of open source benchmarks used to evaluate PyTorch performance.
☆988Updated this week
hidet-org / hidet
An open-source efficient deep learning framework/compiler, written in python.
☆731Updated last month
mlcommons / training
Reference implementations of MLPerf® training benchmarks
☆1,716Updated last month
openxla / stablehlo
Backward compatible ML compute opset inspired by HLO/MHLO
☆553Updated this week
ROCm / composable_kernel
Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
☆472Updated last week
tensorflow / custom-op
Guide for building custom op for TensorFlow
☆382Updated 2 years ago
NVIDIA / nvbench
CUDA Kernel Benchmarking Library
☆742Updated last week
NVIDIA / Fuser
A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
☆357Updated this week
pytorch / tensorpipe
A tensor-aware point-to-point communication primitive for machine learning
☆273Updated 2 months ago
pytorch / PiPPy
Pipeline Parallelism for PyTorch
☆780Updated last year
NVIDIA / TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Bla…
☆2,834Updated this week
openai / blocksparse
Efficient GPU kernels for block-sparse matrix multiplication and convolution
☆1,062Updated 2 years ago
msr-fiddle / pipedream
☆393Updated 2 years ago
pytorch / tvm
TVM integration into PyTorch
☆454Updated 5 years ago
NVIDIA / cub
[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl
☆1,795Updated 2 years ago
facebookresearch / HolisticTraceAnalysis
A library to analyze PyTorch traces.
☆416Updated last week