csarofeen / pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
☆26Updated last year
Related projects ⓘ
Alternatives and complementary repositories for pytorch
- Automatically insert nvtx ranges to PyTorch models☆17Updated 3 years ago
- ☆36Updated 5 months ago
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆268Updated this week
- ☆48Updated 8 months ago
- Codebase associated with the PyTorch compiler tutorial☆44Updated 5 years ago
- Customized matrix multiplication kernels☆53Updated 2 years ago
- oneCCL Bindings for Pytorch*☆86Updated last week
- MLIR-based partitioning system☆36Updated this week
- A tracing JIT for PyTorch☆17Updated 2 years ago
- PyTorch RFCs (experimental)☆127Updated 2 months ago
- A lightweight, Pythonic, frontend for MLIR☆79Updated last year
- GEMM and Winograd based convolutions using CUTLASS☆25Updated 4 years ago
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆123Updated last year
- OpenAI Triton backend for Intel® GPUs☆143Updated this week
- Shared Middle-Layer for Triton Compilation☆185Updated this week
- Test suite for probing the numerical behavior of NVIDIA tensor cores☆30Updated 3 months ago
- ☆128Updated this week
- Fairring (FAIR + Herring) is a plug-in for PyTorch that provides a process group for distributed training that outperforms NCCL at large …☆63Updated 2 years ago
- ☆147Updated 4 months ago
- ☆67Updated last year
- An extension library of WMMA API (Tensor Core API)☆82Updated 3 months ago
- Benchmarks to capture important workloads.☆28Updated 5 months ago
- Issues related to MLPerf™ Inference policies, including rules and suggested changes☆57Updated this week
- ☆14Updated last month
- ☆57Updated this week
- ☆140Updated last year
- Samples demonstrating how to use the Compute Sanitizer Tools and Public API☆65Updated last year
- modified cutlass☆14Updated 4 years ago
- Benchmark code for the "Online normalizer calculation for softmax" paper☆59Updated 6 years ago
- Stretching GPU performance for GEMMs and tensor contractions.☆220Updated this week