hahnyuan / PB-LLM
PB-LLM: Partially Binarized Large Language Models
☆148Updated last year
Related projects ⓘ
Alternatives and complementary repositories for PB-LLM
- ☆122Updated 10 months ago
- An algorithm for static activation quantization of LLMs☆79Updated 2 weeks ago
- EfficientQAT: Efficient Quantization-Aware Training for Large Language Models☆226Updated last month
- ☆96Updated 2 months ago
- [ICML 2024] KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache☆243Updated last month
- ☆134Updated last year
- ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization☆87Updated last month
- ☆184Updated last month
- Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆135Updated 5 months ago
- Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆79Updated this week
- Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".☆262Updated last year
- ☆69Updated this week
- ☆199Updated 5 months ago
- GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM☆149Updated 4 months ago
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (Official Code)☆135Updated last month
- A repository dedicated to evaluating the performance of quantizied LLaMA3 using various quantization methods..☆166Updated 3 months ago
- Fast Matrix Multiplications for Lookup Table-Quantized LLMs☆187Updated this week
- Reorder-based post-training quantization for large language model☆181Updated last year
- Official PyTorch implementation of QA-LoRA☆117Updated 8 months ago
- Official Implementation of SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks☆31Updated 4 months ago
- GPTQLoRA: Efficient Finetuning of Quantized LLMs with GPTQ☆97Updated last year
- QuIP quantization☆46Updated 8 months ago
- Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"☆350Updated 8 months ago
- Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for LLMs☆74Updated 5 months ago
- A toolkit for fine-tuning, inferencing, and evaluating GreenBitAI's LLMs.☆74Updated last month
- (ICML 2024) BiLLM: Pushing the Limit of Post-Training Quantization for LLMs☆197Updated 5 months ago
- [NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization☆305Updated 3 months ago
- FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation☆46Updated 4 months ago
- Unofficial implementations of block/layer-wise pruning methods for LLMs.☆51Updated 6 months ago
- The official implementation of the EMNLP 2023 paper LLM-FP4☆167Updated 11 months ago