facebookresearch / Ternary_Binary_TransformerLinks

ACL 2023

☆39

Alternatives and similar repositories for Ternary_Binary_Transformer

Users that are interested in Ternary_Binary_Transformer are comparing it to the libraries listed below

Sorting:

HuangOwen / RoLoRA
[EMNLP 2024] RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization
☆38Updated last year
lightmatter-ai / INT-FP-QSim
Flexible simulator for mixed precision and format simulation of LLMs and vision transformers.
☆51Updated 2 years ago
ScalingIntelligence / CATS
☆28Updated 10 months ago
ziplab / QLLM
[ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…
☆30Updated last year
Qualcomm-AI-research / gptvq
☆35Updated last year
Qualcomm-AI-research / lr-qat
☆46Updated 11 months ago
LiqunMa / FBI-LLM
FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation
☆51Updated last month
TianjinYellow / StableSPAM
☆24Updated 6 months ago
ruikangliu / Quantized-Reasoning-Models
[COLM 2025] Official PyTorch implementation of "Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models"
☆54Updated 3 months ago
cmd2001 / KVTuner
KVTuner: Sensitivity-Aware Layer-wise Mixed Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference
☆22Updated 4 months ago
fmfi-compbio / admm-pruning
☆29Updated last year
pprp / Pruner-Zero
[ICML24] Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for LLMs
☆94Updated 10 months ago
Intelligent-Computing-Lab-Panda / TesseraQ
☆22Updated 11 months ago
yuzhenmao / IceFormer
Implementation of IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs (ICLR 2024).
☆25Updated 2 months ago
htqin / IR-QLoRA
[ICML 2024 Oral] This project is the official implementation of our Accurate LoRA-Finetuning Quantization of LLMs via Information Retenti…
☆67Updated last year
GATECH-EIC / Linearized-LLM
[ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models
☆35Updated last year
rayleizhu / vllm-ra
[ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Prompts
☆40Updated last year
GATECH-EIC / ShiftAddLLM
ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization
☆110Updated 11 months ago
wimh966 / outlier_suppression
The official PyTorch implementation of the NeurIPS2022 (spotlight) paper, Outlier Suppression: Pushing the Limit of Low-bit Transformer L…
☆48Updated 3 years ago
hahnyuan / PB-LLM
PB-LLM: Partially Binarized Large Language Models
☆155Updated last year
thu-ml / TetraJet-MXFP4Training
Pytorch implementation of "Oscillation-Reduced MXFP4 Training for Vision Transformers" on DeiT Model Pre-training
☆30Updated 3 months ago
HuangOwen / QAT-ACS
[TMLR] Official PyTorch implementation of paper "Efficient Quantization-aware Training with Adaptive Coreset Selection"
☆34Updated last year
ilur98 / DGQ
Official Code For Dual Grained Quantization: Efficient Fine-Grained Quantization for LLM
☆14Updated last year
IST-DASLab / QIGen
Repository for CPU Kernel Generation for LLM Inference
☆26Updated 2 years ago
haochengxi / Train_Transformers_with_INT4
☆156Updated 2 years ago
IST-DASLab / SparseFinetuning
Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry
☆42Updated last year
facebookresearch / bit
Code repo for the paper BiT Robustly Binarized Multi-distilled Transformer
☆111Updated 2 years ago
ModelTC / QLLM
[ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…
☆39Updated last year
mobiusml / low-rank-llama2
Low-Rank Llama Custom Training
☆23Updated last year
duterscmy / CD-MoE
Official PyTorch implementation of CD-MOE
☆12Updated 6 months ago