graphcore-research / unit-scaling-demoLinks

Unit Scaling demo and experimentation code

☆16

Alternatives and similar repositories for unit-scaling-demo

Users that are interested in unit-scaling-demo are comparing it to the libraries listed below

Sorting:

IST-DASLab / QIGen
Repository for CPU Kernel Generation for LLM Inference
☆26Updated 2 years ago
chu-tianxiang / QuIP-for-all
QuIP quantization
☆54Updated last year
IST-DASLab / SparseFinetuning
Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry
☆42Updated last year
IST-DASLab / MicroAdam
This repository contains code for the MicroAdam paper.
☆20Updated 7 months ago
Aleph-Alpha-Research / NeurIPS-WANT-submission-efficient-parallelization-layouts
☆22Updated last year
Ryu1845 / hyena-jax
Implementation of Hyena Hierarchy in JAX
☆10Updated 2 years ago
GindaChen / FlexFlashAttention3
FlexAttention w/ FlashAttention3 Support
☆26Updated 9 months ago
UmerHA / triton_util
Make triton easier
☆47Updated last year
deepspeedai / DeepSpeed-Kernels
☆74Updated 3 months ago
shreyansh26 / Attention-Mask-Patterns
Using FlexAttention to compute attention with different masking patterns
☆44Updated 9 months ago
OpenNLPLab / LASP
Linear Attention Sequence Parallelism (LASP)
☆85Updated last year
ylsung / rsq
Code for "RSQ: Learning from Important Tokens Leads to Better Quantized LLMs"
☆18Updated last month
facebookresearch / Ternary_Binary_Transformer
ACL 2023
☆39Updated 2 years ago
PiotrNawrot / sparse-frontier
The evaluation framework for training-free sparse attention in LLMs
☆83Updated 3 weeks ago
Doraemonzzz / Awesome-Triton-Resources
Awesome Triton Resources
☆32Updated 2 months ago
LiqunMa / FBI-LLM
FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation
☆49Updated last year
prateeky2806 / ComPEFT
☆26Updated last year
hahnyuan / PB-LLM
PB-LLM: Partially Binarized Large Language Models
☆152Updated last year
Edward-Sun / gpt-accelera
Simple and efficient pytorch-native transformer training and inference (batched)
☆77Updated last year
rayleizhu / vllm-ra
[ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Prompts
☆40Updated last year
kyegomez / Blockwise-Parallel-Transformer
32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.
☆48Updated 2 years ago
tanyuqian / redco
NAACL '24 (Best Demo Paper RunnerUp) / MlSys @ NeurIPS '23 - RedCoast: A Lightweight Tool to Automate Distributed Training and Inference
☆66Updated 7 months ago
tridao / flash-attention-wheels
☆51Updated last year
ScalingIntelligence / CATS
☆26Updated 8 months ago
IST-DASLab / Sparse-Marlin
Boosting 4-bit inference kernels with 2:4 Sparsity
☆80Updated 10 months ago
mayank31398 / ladder-residual-inference
☆14Updated this week
BlinkDL / LinearAttentionArena
Here we will test various linear attention designs.
☆60Updated last year
mobiusml / low-rank-llama2
Low-Rank Llama Custom Training
☆23Updated last year
siyan-zhao / prepacking
The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …
☆59Updated 9 months ago
GATECH-EIC / ShiftAddLLM
ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization
☆109Updated 9 months ago