graphcore-research / unit-scaling-demoLinks
Unit Scaling demo and experimentation code
☆16Updated last year
Alternatives and similar repositories for unit-scaling-demo
Users that are interested in unit-scaling-demo are comparing it to the libraries listed below
Sorting:
- Repository for CPU Kernel Generation for LLM Inference☆26Updated 2 years ago
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆42Updated last year
- QuIP quantization☆57Updated last year
- This repository contains code for the MicroAdam paper.☆19Updated 8 months ago
- Linear Attention Sequence Parallelism (LASP)☆86Updated last year
- Flexible simulator for mixed precision and format simulation of LLMs and vision transformers.☆51Updated 2 years ago
- ☆20Updated 4 months ago
- Odysseus: Playground of LLM Sequence Parallelism☆76Updated last year
- FlexAttention w/ FlashAttention3 Support☆27Updated 10 months ago
- 32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.☆49Updated 2 years ago
- The evaluation framework for training-free sparse attention in LLMs☆91Updated 2 months ago
- ☆74Updated 5 months ago
- Using FlexAttention to compute attention with different masking patterns☆44Updated 11 months ago
- ☆55Updated last year
- Boosting 4-bit inference kernels with 2:4 Sparsity☆80Updated 11 months ago
- SQUEEZED ATTENTION: Accelerating Long Prompt LLM Inference☆52Updated 9 months ago
- Quantized Attention on GPU☆44Updated 9 months ago
- ☆110Updated last year
- Transformers components but in Triton☆34Updated 3 months ago
- Accelerate LLM preference tuning via prefix sharing with a single line of code☆43Updated last month
- Dynamic Context Selection for Efficient Long-Context LLMs☆38Updated 3 months ago
- ☆52Updated 2 months ago
- Low-Rank Llama Custom Training☆23Updated last year
- Fast and memory-efficient exact attention☆70Updated 5 months ago
- The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …☆61Updated 10 months ago
- ☆22Updated 5 months ago
- Implementation of Hyena Hierarchy in JAX☆10Updated 2 years ago
- Cascade Speculative Drafting☆29Updated last year
- [ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Prompts☆40Updated last year
- Code for "RSQ: Learning from Important Tokens Leads to Better Quantized LLMs"☆19Updated 2 months ago