insoochung / transformer_bcqLinks
BCQ tutorial for transformers
☆17Updated 2 years ago
Alternatives and similar repositories for transformer_bcq
Users that are interested in transformer_bcq are comparing it to the libraries listed below
Sorting:
- Simple implementation of Speculative Sampling in NumPy for GPT-2.☆96Updated 2 years ago
- Easy and Efficient Quantization for Transformers☆202Updated 4 months ago
- ☆83Updated last year
- QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference☆118Updated last year
- ☆127Updated last year
- [NeurIPS'23] Speculative Decoding with Big Little Decoder☆94Updated last year
- A hackable, simple, and reseach-friendly GRPO Training Framework with high speed weight synchronization in a multinode environment.☆31Updated 2 months ago
- PB-LLM: Partially Binarized Large Language Models☆156Updated last year
- Intel Gaudi's Megatron DeepSpeed Large Language Models for training☆13Updated 10 months ago
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆42Updated last year
- ☆27Updated last year
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.☆92Updated 3 months ago
- Code for studying the super weight in LLM☆120Updated 10 months ago
- JORA: JAX Tensor-Parallel LoRA Library (ACL 2024)☆36Updated last year
- Official implementation for Training LLMs with MXFP4☆100Updated 6 months ago
- Repository for CPU Kernel Generation for LLM Inference☆26Updated 2 years ago
- Experiment of using Tangent to autodiff triton☆80Updated last year
- [ACL 2025] Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models☆31Updated 2 months ago
- Low-bit optimizers for PyTorch☆132Updated 2 years ago
- Compressed LLMs for Efficient Text Generation [ICLR'24 Workshop]☆88Updated last year
- ☆26Updated last year
- ☆121Updated last year
- OSLO: Open Source for Large-scale Optimization☆174Updated 2 years ago
- ☆156Updated 2 years ago
- ☆57Updated last year
- ☆69Updated last year
- The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …☆60Updated last year
- ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization☆110Updated last year
- Experiments on Multi-Head Latent Attention☆97Updated last year
- Pytorch/XLA SPMD Test code in Google TPU☆23Updated last year