apd10 / RzLinearLinks
A compressed alternative to matrix multiplication using state-of-the art compression ROBE-Z
☆9Updated last year
Alternatives and similar repositories for RzLinear
Users that are interested in RzLinear are comparing it to the libraries listed below
Sorting:
- ☆15Updated 3 years ago
- Memory Optimizations for Deep Learning (ICML 2023)☆64Updated last year
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆44Updated 10 months ago
- Customized matrix multiplication kernels☆54Updated 3 years ago
- Explore training for quantized models☆18Updated this week
- Fast sparse deep learning on CPUs☆53Updated 2 years ago
- ☆32Updated last year
- Research and development for optimizing transformers☆126Updated 4 years ago
- [ICML 2022] "Coarsening the Granularity: Towards Structurally Sparse Lottery Tickets" by Tianlong Chen, Xuxi Chen, Xiaolong Ma, Yanzhi Wa…☆33Updated 2 years ago
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.☆109Updated 10 months ago
- pytorch-profiler☆51Updated 2 years ago
- PyTorch centric eager mode debugger☆47Updated 5 months ago
- ☆105Updated 9 months ago
- An implementation of the Llama architecture, to instruct and delight☆21Updated this week
- ☆49Updated last year
- extensible collectives library in triton☆87Updated 2 months ago
- ☆157Updated last year
- CUDA implementation of autoregressive linear attention, with all the latest research findings☆44Updated 2 years ago
- Code for the note "NF4 Isn't Information Theoretically Optimal (and that's Good)☆19Updated last year
- RWKV model implementation☆38Updated last year
- Inference framework for MoE layers based on TensorRT with Python binding☆41Updated 4 years ago
- Utilities for Training Very Large Models☆58Updated 8 months ago
- ☆36Updated 5 months ago
- A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind…☆157Updated 6 months ago
- ☆26Updated last year
- ☆21Updated 3 months ago
- A GPT, made only of MLPs, in Jax☆58Updated 3 years ago
- Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)☆24Updated last year
- QuIP quantization☆52Updated last year
- Benchmarking PyTorch 2.0 different models☆21Updated 2 years ago