astramind-ai / BitMatLinks

An efficent implementation of the method proposed in "The Era of 1-bit LLMs"

☆154

Alternatives and similar repositories for BitMat

Users that are interested in BitMat are comparing it to the libraries listed below

Sorting:

VITA-Group / Q-GaLore
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.
☆202Updated last year
IST-DASLab / qmoe
Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".
☆277Updated last year
FasterDecoding / BitDelta
☆202Updated 10 months ago
hahnyuan / PB-LLM
PB-LLM: Partially Binarized Large Language Models
☆156Updated last year
QuixiAI / grokadamw
☆136Updated last year
HazyResearch / lolcats
Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"
☆248Updated 8 months ago
arcee-ai / PruneMe
Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models
☆249Updated last year
BlinkDL / modded-nanogpt-rwkv
RWKV-7: Surpassing GPT
☆98Updated 11 months ago
Zyphra / Zamba2
PyTorch implementation of models from the Zamba2 series.
☆185Updated 9 months ago
Zyphra / tree_attention
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
☆130Updated 10 months ago
chu-tianxiang / QuIP-for-all
QuIP quantization
☆59Updated last year
qwopqwop200 / gptqlora
GPTQLoRA: Efficient Finetuning of Quantized LLMs with GPTQ
☆102Updated 2 years ago
rafacelente / bllama
1.58-bit LLaMa model
☆83Updated last year
Cornell-RelaxML / QuIP
Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"
☆385Updated last year
neuralmagic / nm-vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
☆266Updated last year
kroggen / mamba.c
Inference of Mamba models in pure C
☆192Updated last year
samchaineau / llm_slerp_generation
Repo hosting codes and materials related to speeding LLMs' inference using token merging.
☆36Updated 2 weeks ago
dust-tt / llama-ssp
Experiments on speculative sampling with Llama models
☆125Updated 2 years ago
schwartz-lab-NLP / TOVA
Token Omission Via Attention
☆127Updated last year
tiiuae / onebitllms
Lightweight toolkit package to train and fine-tune 1.58bit Language models
☆92Updated 5 months ago
joey00072 / ohara
Collection of autoregressive model implementation
☆86Updated 6 months ago
Cornell-RelaxML / qtip
☆152Updated 4 months ago
GreenBitAI / low_bit_llama
Advanced Ultra-Low Bitrate Compression Techniques for the LLaMA Family of LLMs
☆110Updated last year
sebulo / LoQT
☆80Updated 11 months ago
keeeeenw / MicroLlama
Micro Llama is a small Llama based model with 300M parameters trained from scratch with $500 budget
☆161Updated 2 months ago
llm-random / llm-random
☆200Updated last month
vllm-project / compressed-tensors
A safetensors extension to efficiently store sparse quantized tensors on disk
☆180Updated this week
euclaise / supertrainer2000
☆50Updated last year
HanGuo97 / lq-lora
☆127Updated last year
itsnamgyu / block-transformer
Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)
☆162Updated 6 months ago