apd10 / RzLinear
A compressed alternative to matrix multiplication using state-of-the art compression ROBE-Z
☆9Updated last year
Alternatives and similar repositories for RzLinear:
Users that are interested in RzLinear are comparing it to the libraries listed below
- ☆15Updated 2 years ago
- Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)☆24Updated 10 months ago
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆45Updated 9 months ago
- ☆22Updated last year
- 32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.☆47Updated last year
- Memory Optimizations for Deep Learning (ICML 2023)☆62Updated last year
- sigma-MoE layer☆18Updated last year
- Boosting 4-bit inference kernels with 2:4 Sparsity☆72Updated 7 months ago
- Code for the paper: https://arxiv.org/pdf/2309.06979.pdf☆19Updated 8 months ago
- APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to…☆23Updated last week
- Experiment of using Tangent to autodiff triton☆78Updated last year
- ☆11Updated last year
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆40Updated last year
- ACL 2023☆39Updated last year
- Unit Scaling demo and experimentation code☆16Updated last year
- ☆143Updated last year
- ☆68Updated 3 weeks ago
- ☆103Updated 7 months ago
- RWKV model implementation☆37Updated last year
- QuIP quantization☆51Updated last year
- Linear Attention Sequence Parallelism (LASP)☆81Updated 10 months ago
- Repository for CPU Kernel Generation for LLM Inference☆25Updated last year
- Awesome Triton Resources☆23Updated 2 weeks ago
- FlexAttention w/ FlashAttention3 Support☆26Updated 6 months ago
- ☆32Updated last year
- JAX Scalify: end-to-end scaled arithmetics☆16Updated 5 months ago
- CHAI is a library for dynamic pruning of attention heads for efficient LLM inference.☆13Updated 4 months ago
- ☆29Updated 2 years ago
- [NeurIPS'23] Speculative Decoding with Big Little Decoder☆90Updated last year
- ☆20Updated this week