wjytt / bug-free-pancakeLinks

☆10

Alternatives and similar repositories for bug-free-pancake

Users that are interested in bug-free-pancake are comparing it to the libraries listed below

Sorting:

henloitsjoyce / psychic-garbanzo
☆10Updated 6 months ago
tflowdev / upgraded-happiness
☆10Updated 6 months ago
nectere-sdk / congenial-goggles
☆10Updated 6 months ago
Cornell-RelaxML / quip-sharp
☆548Updated 8 months ago
mobiusml / hqq
Official implementation of Half-Quadratic Quantization (HQQ)
☆846Updated 2 weeks ago
QuixiAI / laserRMT
This is our own implementation of 'Layer Selective Rank Reduction'
☆239Updated last year
jondurbin / bagel
A bagel, with everything.
☆323Updated last year
arcee-ai / PruneMe
Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models
☆241Updated last year
Vahe1994 / AQLM
Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.p…
☆1,277Updated 2 months ago
SkunkworksAI / hydra-moe
☆416Updated last year
SqueezeAILab / SqueezeLLM
[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization
☆697Updated 11 months ago
epfml / landmark-attention
Landmark Attention: Random-Access Infinite Context Length for Transformers
☆423Updated last year
redotvideo / mamba-chat
Mamba-Chat: A chat LLM based on the state-space model architecture 🐍
☆928Updated last year
mistralai / megablocks-public
☆864Updated last year
Cornell-RelaxML / QuIP
Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"
☆374Updated last year
apoorvumang / prompt-lookup-decoding
☆550Updated 11 months ago
OpenGVLab / OmniQuant
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
☆828Updated 2 months ago
pratyushasharma / laser
The Truth Is In There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction
☆388Updated last year
microsoft / TransformerCompression
For releasing code related to compression methods for transformers, accompanying our publications
☆436Updated 6 months ago
PABannier / bark.cpp
Suno AI's Bark model in C/C++ for fast text-to-speech generation
☆832Updated 8 months ago
hao-ai-lab / LookaheadDecoding
[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
☆1,260Updated 4 months ago
aphrodite-engine / aphrodite-engine
Large-scale LLM inference engine
☆1,481Updated this week
turboderp-org / exllamav3
An optimized quantization and inference library for running LLMs locally on modern consumer-class GPUs
☆441Updated this week
dzhulgakov / llama-mistral
Inference code for Mistral and Mixtral hacked up into original Llama implementation
☆371Updated last year
IST-DASLab / sparsegpt
Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".
☆816Updated 11 months ago
huggingface / optimum-nvidia
☆987Updated 5 months ago
alasdairforsythe / tokenmonster
Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascript
☆591Updated last year
locuslab / wanda
A simple and effective LLM pruning approach.
☆777Updated 11 months ago
Vahe1994 / SpQR
☆544Updated 7 months ago
astramind-ai / BitMat
An efficent implementation of the method proposed in "The Era of 1-bit LLMs"
☆154Updated 9 months ago