tflowdev / upgraded-happinessLinks

☆10

Alternatives and similar repositories for upgraded-happiness

Users that are interested in upgraded-happiness are comparing it to the libraries listed below

Sorting:

henloitsjoyce / psychic-garbanzo
☆10Updated last year
nectere-sdk / congenial-goggles
☆10Updated last year
wjytt / bug-free-pancake
☆10Updated last year
Vahe1994 / AQLM
Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.p…
☆1,315Updated 6 months ago
dropbox / hqq
Official implementation of Half-Quadratic Quantization (HQQ)
☆912Updated last month
Cornell-RelaxML / quip-sharp
☆577Updated last year
apoorvumang / prompt-lookup-decoding
☆593Updated last year
arcee-ai / PruneMe
Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models
☆261Updated last year
marella / ctransformers
Python bindings for the Transformer models implemented in C/C++ using GGML library.
☆1,879Updated 2 years ago
Cornell-RelaxML / QuIP
Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"
☆396Updated last year
huggingface / optimum-nvidia
☆1,029Updated last year
AnswerDotAI / fsdp_qlora
Training LLMs with QLoRA + FSDP
☆1,539Updated last year
microsoft / TransformerCompression
For releasing code related to compression methods for transformers, accompanying our publications
☆455Updated last year
aphrodite-engine / aphrodite-engine
Large-scale LLM inference engine
☆1,647Updated 2 weeks ago
SqueezeAILab / SqueezeLLM
[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization
☆713Updated last year
OpenGVLab / OmniQuant
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
☆888Updated 2 months ago
jondurbin / bagel
A bagel, with everything.
☆326Updated last year
IST-DASLab / marlin
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
☆1,011Updated last year
Leeroo-AI / mergoo
A library for easily merging multiple LLM experts, and efficiently train the merged LLM.
☆505Updated last year
epfml / landmark-attention
Landmark Attention: Random-Access Infinite Context Length for Transformers
☆426Updated 2 years ago
SkunkworksAI / hydra-moe
☆416Updated 2 years ago
pbelcak / UltraFastBERT
The repository for the code of the UltraFastBERT paper
☆518Updated last year
Vahe1994 / SpQR
☆553Updated last year
jondurbin / airoboros
Customizable implementation of the self-instruct paper.
☆1,049Updated last year
turboderp / exllama
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
☆2,908Updated 2 years ago
alasdairforsythe / tokenmonster
Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascript
☆616Updated last year
redotvideo / mamba-chat
Mamba-Chat: A chat LLM based on the state-space model architecture 🐍
☆939Updated last year
QuixiAI / laserRMT
This is our own implementation of 'Layer Selective Rank Reduction'
☆240Updated last year
tomaarsen / attention_sinks
Extend existing LLMs way beyond the original training length with constant memory usage, without retraining
☆737Updated last year
turboderp-org / exllamav3
An optimized quantization and inference library for running LLMs locally on modern consumer-class GPUs
☆626Updated 2 weeks ago