topal-team / rockmateLinks
☆36Updated 6 months ago
Alternatives and similar repositories for rockmate
Users that are interested in rockmate are comparing it to the libraries listed below
Sorting:
- Memory Optimizations for Deep Learning (ICML 2023)☆64Updated last year
- ☆10Updated 3 years ago
- ☆51Updated 11 months ago
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆167Updated this week
- Experiment of using Tangent to autodiff triton☆79Updated last year
- ☆167Updated last year
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆45Updated 11 months ago
- PyTorch bindings for CUTLASS grouped GEMM.☆100Updated 3 weeks ago
- extensible collectives library in triton☆86Updated 2 months ago
- A library for unit scaling in PyTorch☆125Updated 7 months ago
- ☆43Updated last year
- Sparsity support for PyTorch☆35Updated 3 months ago
- ☆157Updated last year
- ☆105Updated 10 months ago
- A Quirky Assortment of CuTe Kernels☆117Updated this week
- A block oriented training approach for inference time optimization.☆33Updated 10 months ago
- Collection of kernels written in Triton language☆132Updated 2 months ago
- A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.☆170Updated this week
- Framework to reduce autotune overhead to zero for well known deployments.☆77Updated last week
- ☆208Updated 2 years ago
- Complete GPU residency for ML.☆17Updated last week
- This repository contains the experimental PyTorch native float8 training UX☆224Updated 10 months ago
- Flash-Muon: An Efficient Implementation of Muon Optimizer☆131Updated last week
- Customized matrix multiplication kernels☆56Updated 3 years ago
- ☆42Updated 2 years ago
- Code for the AAAI 2024 Oral paper "OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Model…☆63Updated last year
- ☆31Updated last year
- (NeurIPS 2022) Automatically finding good model-parallel strategies, especially for complex models and clusters.☆39Updated 2 years ago
- Training neural networks in TensorFlow 2.0 with 5x less memory☆132Updated 3 years ago
- ☆28Updated 5 months ago