insoochung / transformer_bcqLinks

BCQ tutorial for transformers

☆17

Alternatives and similar repositories for transformer_bcq

Users that are interested in transformer_bcq are comparing it to the libraries listed below

Sorting:

jaymody / speculative-sampling
Simple implementation of Speculative Sampling in NumPy for GPT-2.
☆96Updated 2 years ago
NetEase-FuXi / EETQ
Easy and Efficient Quantization for Transformers
☆202Updated 4 months ago
insuhan / hyper-attn
☆83Updated last year
SqueezeBits / QUICK
QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference
☆118Updated last year
HanGuo97 / lq-lora
☆127Updated last year
kssteven418 / BigLittleDecoder
[NeurIPS'23] Speculative Decoding with Big Little Decoder
☆94Updated last year
goddoe / RLYX
A hackable, simple, and reseach-friendly GRPO Training Framework with high speed weight synchronization in a multinode environment.
☆31Updated 2 months ago
hahnyuan / PB-LLM
PB-LLM: Partially Binarized Large Language Models
☆156Updated last year
HabanaAI / Megatron-DeepSpeed
Intel Gaudi's Megatron DeepSpeed Large Language Models for training
☆13Updated 10 months ago
IST-DASLab / SparseFinetuning
Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry
☆42Updated last year
vedantroy / gpu_kernels
☆27Updated last year
PiotrNawrot / nano-sparse-attention
The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.
☆92Updated 3 months ago
mengxiayu / LLMSuperWeight
Code for studying the super weight in LLM
☆120Updated 10 months ago
aniquetahir / JORA
JORA: JAX Tensor-Parallel LoRA Library (ACL 2024)
☆36Updated last year
amazon-science / mxfp4-llm
Official implementation for Training LLMs with MXFP4
☆100Updated 6 months ago
IST-DASLab / QIGen
Repository for CPU Kernel Generation for LLM Inference
☆26Updated 2 years ago
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆80Updated last year
dmis-lab / Outlier-Safe-Pre-Training
[ACL 2025] Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models
☆31Updated 2 months ago
thu-ml / low-bit-optimizers
Low-bit optimizers for PyTorch
☆132Updated 2 years ago
Nota-NetsPresso / shortened-llm
Compressed LLMs for Efficient Text Generation [ICLR'24 Workshop]
☆88Updated last year
prateeky2806 / ComPEFT
☆26Updated last year
mgmalek / efficient_cross_entropy
☆121Updated last year
EleutherAI / oslo
OSLO: Open Source for Large-scale Optimization
☆174Updated 2 years ago
haochengxi / Train_Transformers_with_INT4
☆156Updated 2 years ago
tridao / flash-attention-wheels
☆57Updated last year
Entropy-xcy / bitnet158
☆69Updated last year
siyan-zhao / prepacking
The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …
☆60Updated last year
GATECH-EIC / ShiftAddLLM
ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization
☆110Updated last year
ambisinister / mla-experiments
Experiments on Multi-Head Latent Attention
☆97Updated last year
HeegyuKim / torch-xla-SPMD
Pytorch/XLA SPMD Test code in Google TPU
☆23Updated last year