merrymercy / Awesome-Efficient-LLMLinks
A curated list for Efficient Large Language Models
☆11Updated last year
Alternatives and similar repositories for Awesome-Efficient-LLM
Users that are interested in Awesome-Efficient-LLM are comparing it to the libraries listed below
Sorting:
- ☆19Updated 10 months ago
- ☆80Updated 6 months ago
- Framework to reduce autotune overhead to zero for well known deployments.☆79Updated 2 weeks ago
- ☆33Updated last year
- FractalTensor is a programming framework that introduces a novel approach to organizing data in deep neural networks (DNNs) as a list of …☆28Updated 7 months ago
- DeeperGEMM: crazy optimized version☆71Updated 3 months ago
- ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization☆109Updated 9 months ago
- PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation☆30Updated 8 months ago
- Debug print operator for cudagraph debugging☆13Updated last year
- [ICLR 2025] TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attention☆44Updated this week
- GPTQ inference TVM kernel☆40Updated last year
- PyTorch compilation tutorial covering TorchScript, torch.fx, and Slapo☆18Updated 2 years ago
- Tile-based language built for AI computation across all scales☆34Updated this week
- SpInfer: Leveraging Low-Level Sparsity for Efficient Large Language Model Inference on GPUs☆51Updated 4 months ago
- GPU operators for sparse tensor operations☆34Updated last year
- TensorRT LLM Benchmark Configuration☆13Updated last year
- ☆75Updated 2 months ago
- An external memory allocator example for PyTorch.☆14Updated 3 years ago
- ☆37Updated last year
- LLM-Inference-Bench☆48Updated 3 weeks ago
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆93Updated last month
- ☆44Updated last week
- Quantized Attention on GPU☆44Updated 8 months ago
- ☆21Updated last week
- ☆60Updated 3 months ago
- SparseTIR: Sparse Tensor Compiler for Deep Learning☆137Updated 2 years ago
- ☆50Updated 2 months ago
- Summary of system papers/frameworks/codes/tools on training or serving large model☆57Updated last year
- ☆150Updated last year
- TritonBench: Benchmarking Large Language Model Capabilities for Generating Triton Operators☆73Updated last month