topal-team / rockmate
☆33Updated last year
Related projects ⓘ
Alternatives and complementary repositories for rockmate
- Memory Optimizations for Deep Learning (ICML 2023)☆60Updated 8 months ago
- ☆11Updated 2 years ago
- ☆24Updated 7 months ago
- Fast Hadamard transform in CUDA, with a PyTorch interface☆111Updated 5 months ago
- ☆23Updated 4 months ago
- [ICLR 2021] "CPT: Efficient Deep Neural Network Training via Cyclic Precision" by Yonggan Fu, Han Guo, Meng Li, Xin Yang, Yining Ding, Vi…☆30Updated 8 months ago
- Official implementation of Neurips 2020 "Sparse Weight Activation Training" paper.☆26Updated 3 years ago
- ☆42Updated 9 months ago
- ☆33Updated 11 months ago
- A library for unit scaling in PyTorch☆105Updated 2 weeks ago
- Flexible simulator for mixed precision and format simulation of LLMs and vision transformers.☆43Updated last year
- ☆156Updated last year
- ☆88Updated 2 months ago
- ☆20Updated last week
- DropIT: Dropping Intermediate Tensors for Memory-Efficient DNN Training (ICLR 2023)☆30Updated last year
- pytorch-profiler☆50Updated last year
- Code for the AAAI 2024 Oral paper "OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Model…☆53Updated 8 months ago
- ☆45Updated 2 weeks ago
- Official PyTorch Implementation of HELP: Hardware-adaptive Efficient Latency Prediction for NAS via Meta-Learning (NeurIPS 2021 Spotlight…☆60Updated 3 months ago
- [ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs☆83Updated 3 months ago
- ☆194Updated last year
- [ICML 2022] "Coarsening the Granularity: Towards Structurally Sparse Lottery Tickets" by Tianlong Chen, Xuxi Chen, Xiaolong Ma, Yanzhi Wa…☆31Updated last year
- The official PyTorch implementation of the NeurIPS2022 (spotlight) paper, Outlier Suppression: Pushing the Limit of Low-bit Transformer L…☆46Updated 2 years ago
- ☆41Updated 2 years ago
- ☆39Updated 3 years ago
- Code for ICML 2021 submission☆35Updated 3 years ago
- A collection of research papers on efficient training of DNNs☆68Updated 2 years ago
- PyTorch extension for emulating FP8 data formats on standard FP32 Xeon/GPU hardware.☆100Updated 11 months ago
- extensible collectives library in triton☆71Updated last month
- Collection of kernels written in Triton language☆68Updated 3 weeks ago