dmlc / dlpackLinks

common in-memory tensor structure

☆1,042

Alternatives and similar repositories for dlpack

Users that are interested in dlpack are comparing it to the libraries listed below

Sorting:

pytorch / gloo
Collective communications library with various primitives for multi-machine training.
☆1,332Updated this week
pytorch / FBGEMM
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
☆1,415Updated this week
jiazhihao / TASO
The Tensor Algebra SuperOptimizer for Deep Learning
☆726Updated 2 years ago
microsoft / nnfusion
A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.
☆992Updated 10 months ago
tensorflow / runtime
A performant and modular runtime for TensorFlow
☆758Updated 3 months ago
pytorch / torchdynamo
A Python-level JIT compiler designed to make unmodified PyTorch programs faster.
☆1,056Updated last year
pytorch / tensorpipe
A tensor-aware point-to-point communication primitive for machine learning
☆259Updated 2 years ago
zdevito / ATen
ATen: A TENsor library for C++11
☆703Updated 5 years ago
pytorch / kineto
A CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters.
☆842Updated last week
tensorflow / mlir-hlo
☆420Updated this week
pytorch / tvm
TVM integration into PyTorch
☆453Updated 5 years ago
openxla / stablehlo
Backward compatible ML compute opset inspired by HLO/MHLO
☆510Updated last week
tensorflow / custom-op
Guide for building custom op for TensorFlow
☆382Updated 2 years ago
NVIDIA / NVTX
The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resou…
☆427Updated last week
rapidsai / rmm
RAPIDS Memory Manager
☆600Updated this week
NVIDIA / cuCollections
☆557Updated last week
inducer / loopy
A code generator for array-based code on CPUs and GPUs
☆609Updated this week
onnx / onnx-mlir
Representation and Reference Lowering of ONNX Models in MLIR Compiler Infrastructure
☆887Updated this week
tensor-compiler / taco
The Tensor Algebra Compiler (taco) computes sparse tensor expressions on CPUs and GPUs
☆1,314Updated 3 months ago
llvm / torch-mlir
The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.
☆1,591Updated last week
d2l-ai / d2l-tvm
Dive into Deep Learning Compiler
☆646Updated 3 years ago
pytorch / benchmark
TorchBench is a collection of open source benchmarks used to evaluate PyTorch performance.
☆966Updated this week
pytorch / builder
Continuous builder and binary build scripts for pytorch
☆353Updated 2 months ago
libxsmm / libxsmm
Library for specialized dense and sparse matrix operations, and deep learning primitives.
☆888Updated this week
NVIDIA / PyProf
A GPU performance profiling tool for PyTorch models
☆503Updated 4 years ago
NVIDIA / nvbench
CUDA Kernel Benchmarking Library
☆691Updated last week
ezyang / nvprof2json
Convert nvprof profiles into about:tracing compatible JSON files
☆70Updated 4 years ago
NVIDIA / cub
[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl
☆1,765Updated last year
NVIDIA / Fuser
A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
☆345Updated this week
nv-legate / cupynumeric
NumPy and SciPy on Multi-Node Multi-GPU systems
☆917Updated last week