henloitsjoyce / psychic-garbanzoLinks

☆10

Alternatives and similar repositories for psychic-garbanzo

Users that are interested in psychic-garbanzo are comparing it to the libraries listed below

Sorting:

tflowdev / upgraded-happiness
☆10Updated 7 months ago
nectere-sdk / congenial-goggles
☆10Updated 7 months ago
wjytt / bug-free-pancake
☆10Updated 7 months ago
Cornell-RelaxML / quip-sharp
☆554Updated 10 months ago
arcee-ai / PruneMe
Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models
☆247Updated last year
mobiusml / hqq
Official implementation of Half-Quadratic Quantization (HQQ)
☆874Updated 2 weeks ago
AnswerDotAI / fsdp_qlora
Training LLMs with QLoRA + FSDP
☆1,527Updated 9 months ago
SkunkworksAI / hydra-moe
☆416Updated last year
SqueezeAILab / SqueezeLLM
[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization
☆703Updated last year
jondurbin / bagel
A bagel, with everything.
☆324Updated last year
Vahe1994 / AQLM
Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.p…
☆1,287Updated 3 weeks ago
microsoft / TransformerCompression
For releasing code related to compression methods for transformers, accompanying our publications
☆442Updated 7 months ago
Leeroo-AI / mergoo
A library for easily merging multiple LLM experts, and efficiently train the merged LLM.
☆491Updated last year
hao-ai-lab / LookaheadDecoding
[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
☆1,276Updated 5 months ago
tomaarsen / attention_sinks
Extend existing LLMs way beyond the original training length with constant memory usage, without retraining
☆711Updated last year
princeton-nlp / LLM-Shearing
[ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
☆629Updated last year
apoorvumang / prompt-lookup-decoding
☆563Updated last year
huggingface / optimum-nvidia
☆1,000Updated 6 months ago
OpenGVLab / OmniQuant
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
☆844Updated 3 months ago
epfml / landmark-attention
Landmark Attention: Random-Access Infinite Context Length for Transformers
☆423Updated last year
pratyushasharma / laser
The Truth Is In There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction
☆388Updated last year
pbelcak / UltraFastBERT
The repository for the code of the UltraFastBERT paper
☆518Updated last year
locuslab / wanda
A simple and effective LLM pruning approach.
☆793Updated last year
jondurbin / airoboros
Customizable implementation of the self-instruct paper.
☆1,051Updated last year
QuixiAI / laserRMT
This is our own implementation of 'Layer Selective Rank Reduction'
☆240Updated last year
alasdairforsythe / tokenmonster
Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascript
☆599Updated last year
IST-DASLab / sparsegpt
Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".
☆832Updated last year
punica-ai / punica
Serving multiple LoRA finetuned LLM as one
☆1,086Updated last year
astramind-ai / BitMat
An efficent implementation of the method proposed in "The Era of 1-bit LLMs"
☆155Updated 10 months ago
Vahe1994 / SpQR
☆546Updated 8 months ago