linkedin / QuantEaseLinks

QuantEase, a layer-wise quantization framework, frames the problem as discrete-structured non-convex optimization. Our work leverages Coordinate Descent techniques, offering high-quality solutions without the need for matrix inversion or decomposition.

☆17

Alternatives and similar repositories for QuantEase

Users that are interested in QuantEase are comparing it to the libraries listed below

Sorting:

punica-ai / punica
Serving multiple LoRA finetuned LLM as one
☆1,062Updated last year
haoliuhl / ringattention
Large Context Attention
☆714Updated 4 months ago
huggingface / nanotron
Minimalistic large language model 3D-parallelism training
☆1,898Updated this week
ScalingIntelligence / KernelBench
KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problems
☆374Updated this week
foundation-model-stack / foundation-model-stack
🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.
☆198Updated this week
kvignesh1420 / cot-icl-lab
[ACL 2025] CoT-ICL Lab: A Synthetic Framework for Studying Chain-of-Thought Learning from In-Context Demonstrations
☆10Updated 2 weeks ago
BobMcDear / attorch
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
☆546Updated this week
AI-Hypercomputer / JetStream
JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs wel…
☆335Updated this week
google / paxml
Pax is a Jax-based machine learning framework for training large scale models. Pax allows for advanced and fully configurable experimenta…
☆499Updated 2 weeks ago
HazyResearch / aisys-building-blocks
Building blocks for foundation models.
☆502Updated last year
foundation-model-stack / fms-fsdp
🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…
☆249Updated this week
google / aqt
☆310Updated 2 weeks ago
pytorch / FBGEMM
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
☆1,348Updated this week
tspeterkim / flash-attention-minimal
Flash Attention in ~100 lines of CUDA (forward pass only)
☆833Updated 5 months ago
mlops-discord / gpu-optimization-workshop
Slides, notes, and materials for the workshop
☆326Updated last year
NVIDIA / Star-Attention
Efficient LLM Inference over Long Sequences
☆376Updated this week
rwitten / HighPerfLLMs2024
☆487Updated 10 months ago
flexflow / flexflow-train
Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training
☆1,797Updated last week
google-deepmind / recurrentgemma
Open weights language model from Google DeepMind, based on Griffin.
☆639Updated last week
mirage-project / mirage
Mirage: Automatically Generating Fast GPU Kernels without Programming in Triton/CUDA
☆850Updated this week
pytorch-labs / applied-ai
Applied AI experiments and examples for PyTorch
☆271Updated last week
microsoft / BitBLAS
BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
☆619Updated last month
flashinfer-ai / flashinfer
FlashInfer: Kernel Library for LLM Serving
☆3,088Updated this week
ppl-ai / pplx-kernels
Perplexity GPU Kernels
☆324Updated 2 weeks ago
HazyResearch / ThunderKittens
Tile primitives for speedy kernels
☆2,420Updated this week
NVIDIA / kvpress
LLM KV cache compression made easy
☆493Updated 3 weeks ago
mit-han-lab / omniserve
[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Se…
☆690Updated 3 months ago
mlcommons / training
Reference implementations of MLPerf™ training benchmarks
☆1,675Updated 3 weeks ago
mlfoundations / open_lm
A repository for research on medium sized language models.
☆497Updated last month
hao-ai-lab / LookaheadDecoding
[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
☆1,249Updated 3 months ago