insoochung / transformer_bcqLinks
BCQ tutorial for transformers
☆17Updated 2 years ago
Alternatives and similar repositories for transformer_bcq
Users that are interested in transformer_bcq are comparing it to the libraries listed below
Sorting:
- Official implementation for Training LLMs with MXFP4☆79Updated 4 months ago
- Simple implementation of Speculative Sampling in NumPy for GPT-2.☆95Updated 2 years ago
- [NeurIPS'23] Speculative Decoding with Big Little Decoder☆94Updated last year
- ☆127Updated last year
- The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …☆61Updated 10 months ago
- Code for studying the super weight in LLM☆115Updated 9 months ago
- ☆45Updated 9 months ago
- Easy and Efficient Quantization for Transformers☆203Updated 2 months ago
- ☆81Updated last year
- Compressed LLMs for Efficient Text Generation [ICLR'24 Workshop]☆85Updated 11 months ago
- QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference☆119Updated last year
- ☆118Updated last year
- ☆69Updated last year
- ☆26Updated last year
- Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*☆87Updated last year
- Pytorch/XLA SPMD Test code in Google TPU☆23Updated last year
- Awesome Triton Resources☆33Updated 4 months ago
- Implementation of Infini-Transformer in Pytorch☆111Updated 7 months ago
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆42Updated last year
- ☆55Updated last year
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.☆85Updated last month
- Experiments on Multi-Head Latent Attention☆95Updated last year
- [ACL 2025] Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models☆29Updated last month
- ☆202Updated 8 months ago
- ☆29Updated 9 months ago
- ☆140Updated 6 months ago
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)☆161Updated 4 months ago
- This repository contains the training code of ParetoQ introduced in our work "ParetoQ Scaling Laws in Extremely Low-bit LLM Quantization"☆98Updated 3 months ago
- The evaluation framework for training-free sparse attention in LLMs☆91Updated 2 months ago
- DPO, but faster 🚀☆44Updated 8 months ago