apd10 / RzLinear
A compressed alternative to matrix multiplication using state-of-the art compression ROBE-Z
☆9Updated last year
Alternatives and similar repositories for RzLinear:
Users that are interested in RzLinear are comparing it to the libraries listed below
- ☆15Updated 2 years ago
- CHAI is a library for dynamic pruning of attention heads for efficient LLM inference.☆12Updated 3 months ago
- Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees" adapted for Llama models☆36Updated last year
- Customized matrix multiplication kernels☆53Updated 3 years ago
- Boosting 4-bit inference kernels with 2:4 Sparsity☆71Updated 6 months ago
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆45Updated 8 months ago
- 32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.☆46Updated last year
- Memory Optimizations for Deep Learning (ICML 2023)☆62Updated last year
- ☆102Updated 7 months ago
- sigma-MoE layer☆18Updated last year
- Explore training for quantized models☆17Updated 2 months ago
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆40Updated last year
- ☆20Updated last year
- ☆26Updated last year
- GPU operators for sparse tensor operations☆31Updated last year
- ☆19Updated last week
- Repository for CPU Kernel Generation for LLM Inference☆25Updated last year
- Prototype routines for GPU quantization written using PyTorch.☆20Updated last month
- ☆22Updated last year
- APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to…☆23Updated last month
- ☆20Updated last year
- ☆63Updated this week
- Implementation of Hyena Hierarchy in JAX☆10Updated last year
- JAX Scalify: end-to-end scaled arithmetics☆16Updated 5 months ago
- ☆33Updated last year
- Confident Adaptive Transformers☆12Updated 3 years ago
- Inference framework for MoE layers based on TensorRT with Python binding☆41Updated 3 years ago
- QuIP quantization☆52Updated last year
- Flexible simulator for mixed precision and format simulation of LLMs and vision transformers.☆48Updated last year
- The implementation for MLSys 2023 paper: "Cuttlefish: Low-rank Model Training without All The Tuning"☆44Updated last year