apd10 / RzLinearLinks

A compressed alternative to matrix multiplication using state-of-the art compression ROBE-Z

☆9

Alternatives and similar repositories for RzLinear

Users that are interested in RzLinear are comparing it to the libraries listed below

Sorting:

apd10 / universal_memory_allocation
☆15Updated 3 years ago
RobertCsordas / moe_layer
sigma-MoE layer
☆20Updated last year
graphcore-research / out-of-the-box-fp8-training
Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.
☆46Updated last year
glassroom / heinsen_attention
Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)
☆24Updated last year
SmerkyG / GoldFinch-paper
GoldFinch and other hybrid transformer components
☆11Updated last month
softmax1 / Flash-Attention-Softmax-N
CUDA and Triton implementations of Flash Attention with SoftmaxN.
☆72Updated last year
eth-easl / fmengine
Utilities for Training Very Large Models
☆58Updated 10 months ago
sustcsonglin / gated_linear_attention_layer
☆32Updated last year
chu-tianxiang / QuIP-for-all
QuIP quantization
☆55Updated last year
DS3Lab / CocktailSGD
☆27Updated last year
Doraemonzzz / nanoTransNormer
☆11Updated last year
GindaChen / FlexFlashAttention3
FlexAttention w/ FlashAttention3 Support
☆27Updated 10 months ago
tobiaskatsch / GatedLinearRNN
☆28Updated last year
Aleph-Alpha-Research / NeurIPS-WANT-submission-efficient-parallelization-layouts
☆22Updated last year
bentherien / mu_learned_optimization
[Oral; Neurips OPT2024 ] μLO: Compute-Efficient Meta-Generalization of Learned Optimizers
☆13Updated 4 months ago
renll / SeqBoat
[NeurIPS 2023] Sparse Modular Activation for Efficient Sequence Modeling
☆38Updated last year
graphcore-research / jax-scalify
JAX Scalify: end-to-end scaled arithmetics
☆16Updated 9 months ago
tridao / flash-attention-wheels
☆53Updated last year
IST-DASLab / QIGen
Repository for CPU Kernel Generation for LLM Inference
☆26Updated 2 years ago
ethansmith2000 / TransformerExperiments
☆19Updated 2 months ago
davisyoshida / abnormal-floats
Code for the note "NF4 Isn't Information Theoretically Optimal (and that's Good)
☆19Updated 2 years ago
tensorpro / tpu_rwkv
JAX implementations of RWKV
☆19Updated last year
sekstini / gpupoor
☆17Updated 8 months ago
proger / hippogriff
Griffin MQA + Hawk Linear RNN Hybrid
☆88Updated last year
lianakoleva / no-libtorch-compile
☆21Updated 5 months ago
haileyschoelkopf / triton-index
See https://github.com/cuda-mode/triton-index/ instead!
☆11Updated last year
cat-state / tinypar
☆20Updated 2 years ago
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆80Updated last year
habanero-lab / APPy
APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to…
☆24Updated last month
acosharma / elita-transformer
Official Repository for Efficient Linear-Time Attention Transformers.
☆18Updated last year