apd10 / RzLinearLinks
A compressed alternative to matrix multiplication using state-of-the art compression ROBE-Z
☆9Updated last year
Alternatives and similar repositories for RzLinear
Users that are interested in RzLinear are comparing it to the libraries listed below
Sorting:
- ☆15Updated 3 years ago
- Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)☆24Updated last year
- ☆22Updated last year
- CUDA and Triton implementations of Flash Attention with SoftmaxN.☆70Updated last year
- ☆21Updated 4 months ago
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆46Updated last year
- CHAI is a library for dynamic pruning of attention heads for efficient LLM inference.☆17Updated 7 months ago
- ☆11Updated last year
- APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to…☆24Updated 3 weeks ago
- See https://github.com/cuda-mode/triton-index/ instead!☆11Updated last year
- Implementation of 2-simplicial attention proposed by Clift et al. (2019) and the recent attempt to make practical in Fast and Simplex, Ro…☆34Updated this week
- ☆32Updated last year
- Let us make Psychohistory (as in Asimov) a reality, and accessible to everyone. Useful for LLM grounding and games / fiction / business /…☆40Updated 2 years ago
- Utilities for Training Very Large Models☆58Updated 9 months ago
- JAX implementations of RWKV☆19Updated last year
- GoldFinch and other hybrid transformer components☆10Updated last week
- FlexAttention w/ FlashAttention3 Support☆26Updated 9 months ago
- ☆21Updated 2 years ago
- sigma-MoE layer☆20Updated last year
- QuIP quantization☆54Updated last year
- ☆28Updated last year
- Code for the note "NF4 Isn't Information Theoretically Optimal (and that's Good)☆19Updated 2 years ago
- LayerNorm(SmallInit(Embedding)) in a Transformer to improve convergence☆59Updated 3 years ago
- ☆27Updated last year
- JAX Scalify: end-to-end scaled arithmetics☆16Updated 8 months ago
- Awesome Triton Resources☆32Updated 2 months ago
- Faster Pytorch bitsandbytes 4bit fp4 nn.Linear ops☆30Updated last year
- Implementation of Hyena Hierarchy in JAX☆10Updated 2 years ago
- BigKnow2022: Bringing Language Models Up to Speed☆15Updated 2 years ago
- An unofficial implementation of the Infini-gram model proposed by Liu et al. (2024)☆33Updated last year