insoochung / transformer_bcqLinks
BCQ tutorial for transformers
☆18Updated 2 years ago
Alternatives and similar repositories for transformer_bcq
Users that are interested in transformer_bcq are comparing it to the libraries listed below
Sorting:
- Simple implementation of Speculative Sampling in NumPy for GPT-2.☆98Updated 2 years ago
- ☆128Updated last year
- Code for studying the super weight in LLM☆121Updated last year
- ☆83Updated 2 years ago
- Easy and Efficient Quantization for Transformers☆203Updated 5 months ago
- [NeurIPS'23] Speculative Decoding with Big Little Decoder☆95Updated last year
- ☆27Updated last year
- The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …☆60Updated last year
- Official implementation for Training LLMs with MXFP4☆112Updated 7 months ago
- QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference☆118Updated last year
- ☆10Updated last year
- ☆50Updated last year
- The evaluation framework for training-free sparse attention in LLMs☆106Updated 2 months ago
- PB-LLM: Partially Binarized Large Language Models☆157Updated 2 years ago
- ☆121Updated last year
- ☆155Updated 10 months ago
- [ACL 2025] Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models☆34Updated last month
- Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)☆64Updated last year
- Compressed LLMs for Efficient Text Generation [ICLR'24 Workshop]☆89Updated last year
- Awesome Triton Resources☆38Updated 7 months ago
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆42Updated last year
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.☆91Updated 4 months ago
- ☆31Updated last year
- Intel Gaudi's Megatron DeepSpeed Large Language Models for training☆15Updated 11 months ago
- Example ML projects that use the Determined library.☆32Updated last year
- ☆101Updated last year
- Quantize transformers to any learned arbitrary 4-bit numeric format☆50Updated 5 months ago
- ☆47Updated last year
- Experiments on Multi-Head Latent Attention☆99Updated last year
- ☆70Updated last year