apd10 / RzLinearLinks
A compressed alternative to matrix multiplication using state-of-the art compression ROBE-Z
☆9Updated last year
Alternatives and similar repositories for RzLinear
Users that are interested in RzLinear are comparing it to the libraries listed below
Sorting:
- ☆15Updated 3 years ago
- sigma-MoE layer☆20Updated last year
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆46Updated last year
- Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)☆24Updated last year
- GoldFinch and other hybrid transformer components☆11Updated last month
- CUDA and Triton implementations of Flash Attention with SoftmaxN.☆72Updated last year
- Utilities for Training Very Large Models☆58Updated 10 months ago
- ☆32Updated last year
- QuIP quantization☆55Updated last year
- ☆27Updated last year
- ☆11Updated last year
- FlexAttention w/ FlashAttention3 Support☆27Updated 10 months ago
- ☆28Updated last year
- ☆22Updated last year
- [Oral; Neurips OPT2024 ] μLO: Compute-Efficient Meta-Generalization of Learned Optimizers☆13Updated 4 months ago
- [NeurIPS 2023] Sparse Modular Activation for Efficient Sequence Modeling☆38Updated last year
- JAX Scalify: end-to-end scaled arithmetics☆16Updated 9 months ago
- ☆53Updated last year
- Repository for CPU Kernel Generation for LLM Inference☆26Updated 2 years ago
- ☆19Updated 2 months ago
- Code for the note "NF4 Isn't Information Theoretically Optimal (and that's Good)☆19Updated 2 years ago
- JAX implementations of RWKV☆19Updated last year
- ☆17Updated 8 months ago
- Griffin MQA + Hawk Linear RNN Hybrid☆88Updated last year
- ☆21Updated 5 months ago
- See https://github.com/cuda-mode/triton-index/ instead!☆11Updated last year
- ☆20Updated 2 years ago
- Experiment of using Tangent to autodiff triton☆80Updated last year
- APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to…☆24Updated last month
- Official Repository for Efficient Linear-Time Attention Transformers.☆18Updated last year