topal-team / rockmateLinks
☆36Updated last year
Alternatives and similar repositories for rockmate
Users that are interested in rockmate are comparing it to the libraries listed below
Sorting:
- ☆10Updated 3 years ago
- ☆221Updated 2 years ago
- ☆159Updated 2 years ago
- ☆31Updated last year
- Butterfly matrix multiplication in PyTorch☆177Updated 2 years ago
- Fast Hadamard transform in CUDA, with a PyTorch interface☆264Updated last month
- Collection of kernels written in Triton language☆173Updated 8 months ago
- Sparsity support for PyTorch☆37Updated 8 months ago
- ☆113Updated last year
- Memory Optimizations for Deep Learning (ICML 2023)☆112Updated last year
- Experiment of using Tangent to autodiff triton☆81Updated last year
- This repository contains the experimental PyTorch native float8 training UX☆227Updated last year
- ☆37Updated last year
- A bunch of kernels that might make stuff slower 😉☆65Updated last week
- Training neural networks in TensorFlow 2.0 with 5x less memory☆137Updated 3 years ago
- pytorch-profiler☆51Updated 2 years ago
- A collection of research papers on efficient training of DNNs☆70Updated 3 years ago
- extensible collectives library in triton☆91Updated 8 months ago
- ☆185Updated last year
- Distributed K-FAC preconditioner for PyTorch☆93Updated this week
- ☆43Updated last year
- Code for the paper “Four Over Six: More Accurate NVFP4 Quantization with Adaptive Block Scaling”☆66Updated last week
- ☆234Updated 10 months ago
- A library for unit scaling in PyTorch☆132Updated 5 months ago
- ☆27Updated 2 years ago
- PyTorch extension for emulating FP8 data formats on standard FP32 Xeon/GPU hardware.☆112Updated last year
- ☆60Updated last year
- ☆167Updated 2 years ago
- Flexible simulator for mixed precision and format simulation of LLMs and vision transformers.☆51Updated 2 years ago
- Official PyTorch Implementation of HELP: Hardware-adaptive Efficient Latency Prediction for NAS via Meta-Learning (NeurIPS 2021 Spotlight…☆63Updated last year