henloitsjoyce / psychic-garbanzoLinks
☆10Updated 6 months ago
Alternatives and similar repositories for psychic-garbanzo
Users that are interested in psychic-garbanzo are comparing it to the libraries listed below
Sorting:
- ☆10Updated 6 months ago
- ☆10Updated 6 months ago
- ☆10Updated 6 months ago
- ☆548Updated 8 months ago
- Official implementation of Half-Quadratic Quantization (HQQ)☆846Updated 2 weeks ago
- This is our own implementation of 'Layer Selective Rank Reduction'☆239Updated last year
- A bagel, with everything.☆323Updated last year
- Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models☆241Updated last year
- Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.p…☆1,277Updated 2 months ago
- ☆416Updated last year
- [ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization☆697Updated 11 months ago
- Landmark Attention: Random-Access Infinite Context Length for Transformers☆423Updated last year
- Mamba-Chat: A chat LLM based on the state-space model architecture 🐍☆928Updated last year
- ☆864Updated last year
- Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"☆374Updated last year
- ☆550Updated 11 months ago
- [ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.☆828Updated 2 months ago
- The Truth Is In There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction☆388Updated last year
- For releasing code related to compression methods for transformers, accompanying our publications☆436Updated 6 months ago
- Suno AI's Bark model in C/C++ for fast text-to-speech generation☆832Updated 8 months ago
- [ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding☆1,260Updated 4 months ago
- Large-scale LLM inference engine☆1,481Updated this week
- An optimized quantization and inference library for running LLMs locally on modern consumer-class GPUs☆441Updated this week
- Inference code for Mistral and Mixtral hacked up into original Llama implementation☆371Updated last year
- Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".☆816Updated 11 months ago
- ☆987Updated 5 months ago
- Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascript☆591Updated last year
- A simple and effective LLM pruning approach.☆777Updated 11 months ago
- ☆544Updated 7 months ago
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆154Updated 9 months ago