hidet-org / hidetLinks

An open-source efficient deep learning framework/compiler, written in python.

☆710

Alternatives and similar repositories for hidet

Users that are interested in hidet are comparing it to the libraries listed below

Sorting:

pytorch / torchdynamo
A Python-level JIT compiler designed to make unmodified PyTorch programs faster.
☆1,056Updated last year
facebookresearch / HolisticTraceAnalysis
A library to analyze PyTorch traces.
☆398Updated last week
openxla / stablehlo
Backward compatible ML compute opset inspired by HLO/MHLO
☆510Updated last week
mobiusml / gemlite
Fast low-bit matmul kernels in Triton
☆338Updated this week
BobMcDear / attorch
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
☆563Updated 2 weeks ago
pytorch / PiPPy
Pipeline Parallelism for PyTorch
☆775Updated 11 months ago
albanD / subclass_zoo
☆171Updated last year
pytorch-labs / applied-ai
Applied AI experiments and examples for PyTorch
☆289Updated 2 months ago
google / aqt
☆323Updated last month
pytorch / kineto
A CPU+GPU Profiling library that provides access to timeline traces and hardware performance counters.
☆842Updated last week
NVIDIA / Fuser
A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
☆345Updated this week
pytorch-labs / float8_experimental
This repository contains the experimental PyTorch native float8 training UX
☆224Updated last year
Dao-AILab / quack
A Quirky Assortment of CuTe Kernels
☆374Updated this week
Deep-Learning-Profiling-Tools / triton-viz
☆227Updated this week
lucidrains / triton-transformer
Implementation of a Transformer, but completely in Triton
☆273Updated 3 years ago
onnx / onnx-mlir
Representation and Reference Lowering of ONNX Models in MLIR Compiler Infrastructure
☆887Updated this week
microsoft / BitBLAS
BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
☆653Updated 3 weeks ago
tspeterkim / flash-attention-minimal
Flash Attention in ~100 lines of CUDA (forward pass only)
☆887Updated 7 months ago
microsoft / triton-shared
Shared Middle-Layer for Triton Compilation
☆260Updated this week
pytorch-labs / helion
A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.
☆200Updated this week
pranjalssh / fast.cu
Fastest kernels written from scratch
☆308Updated 3 months ago
microsoft / onnxscript
ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.
☆369Updated this week
ColfaxResearch / cutlass-kernels
☆227Updated last year
gpu-mode / triton-index
Cataloging released Triton kernels.
☆246Updated 6 months ago
microsoft / varuna
☆251Updated last year
mlc-ai / mlc-en
☆422Updated 9 months ago
thuml / depyf
depyf is a tool to help you understand and adapt to PyTorch compiler torch.compile.
☆706Updated 3 months ago
foundation-model-stack / foundation-model-stack
🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.
☆207Updated last week
llvm / torch-mlir
The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.
☆1,591Updated last week
pytorch / torchx
TorchX is a universal job launcher for PyTorch applications. TorchX is designed to have fast iteration time for training/research and sup…
☆378Updated this week