fire / pytorch-nncpLinks
☆11Updated 2 years ago
Alternatives and similar repositories for pytorch-nncp
Users that are interested in pytorch-nncp are comparing it to the libraries listed below
Sorting:
- Dzip: improved general-purpose lossless compression based on novel neural network modeling☆73Updated 3 years ago
- An implementation of LLMzip using GPT-2☆13Updated 2 years ago
- This repository contains the source code and dataset link mentioned in WWW 2022 accepted paper "TRACE:A Fast Transformer-based General-Pu…☆30Updated 3 years ago
- GoldFinch and other hybrid transformer components☆12Updated 3 weeks ago
- Implementation of MambaByte in "MambaByte: Token-free Selective State Space Model" in Pytorch and Zeta☆123Updated 2 weeks ago
- ☆61Updated 9 months ago
- Repository for CPU Kernel Generation for LLM Inference☆26Updated 2 years ago
- flow-merge is a powerful Python library that enables seamless merging of multiple transformer-based language models using the most popula…☆20Updated 8 months ago
- ☆157Updated last year
- QuIP quantization☆59Updated last year
- Low-bit optimizers for PyTorch☆131Updated 2 years ago
- Code for my ICLR 2024 TinyPapers paper "Prune and Tune: Improving Efficient Pruning Techniques for Massive Language Models"☆16Updated 2 years ago
- Here we will test various linear attention designs.☆61Updated last year
- Official code for ReLoRA from the paper Stack More Layers Differently: High-Rank Training Through Low-Rank Updates☆465Updated last year
- GPTQLoRA: Efficient Finetuning of Quantized LLMs with GPTQ☆102Updated 2 years ago
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆99Updated last year
- SparseGPT + GPTQ Compression of LLMs like LLaMa, OPT, Pythia☆40Updated 2 years ago
- Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"☆384Updated last year
- Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees" adapted for Llama models☆40Updated 2 years ago
- ☆57Updated last year
- ☆152Updated 3 months ago
- Inference code for LLaMA models☆42Updated 2 years ago
- Advanced Ultra-Low Bitrate Compression Techniques for the LLaMA Family of LLMs☆110Updated last year
- Experiments with BitNet inference on CPU☆54Updated last year
- A converter and basic tester for rwkv onnx☆42Updated last year
- A byte-level decoder architecture that matches the performance of tokenized Transformers.☆66Updated last year
- This repository contains an implementation of the LLaMA 2 (Large Language Model Meta AI) model, a Generative Pretrained Transformer (GPT)…☆72Updated 2 years ago
- Official code for "SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient"☆144Updated last year
- ☆13Updated 2 years ago
- Implementation of a modular, high-performance, and simplistic mamba for high-speed applications☆36Updated 11 months ago