fire / pytorch-nncpLinks
☆12Updated 2 years ago
Alternatives and similar repositories for pytorch-nncp
Users that are interested in pytorch-nncp are comparing it to the libraries listed below
Sorting:
- Dzip: improved general-purpose lossless compression based on novel neural network modeling☆74Updated 3 years ago
- This repository contains the source code and dataset link mentioned in WWW 2022 accepted paper "TRACE:A Fast Transformer-based General-Pu…☆30Updated 3 years ago
- An implementation of LLMzip using GPT-2☆13Updated 2 years ago
- ☆63Updated 10 months ago
- A collection of tools for neural compression enthusiasts.☆582Updated last year
- Speedup the attention computation of Swin Transformer☆25Updated 5 months ago
- GoldFinch and other hybrid transformer components☆12Updated last month
- Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"☆390Updated last year
- QuIP quantization☆61Updated last year
- ☆165Updated last year
- Low-bit optimizers for PyTorch☆132Updated 2 years ago
- openvino version of openai/whisper☆178Updated 2 years ago
- ☆57Updated 2 years ago
- A high-throughput and memory-efficient inference and serving engine for Whisper, https://mesolitica.com/blog/vllm-whisper☆31Updated last year
- Implementation of the conditionally routed attention in the CoLT5 architecture, in Pytorch☆230Updated last year
- Experimental playground for benchmarking language model (LM) architectures, layers, and tricks on smaller datasets. Designed for flexible…☆87Updated 2 weeks ago
- Residual Quantization with Implicit Neural Codebooks☆105Updated last month
- Implementation of MambaByte in "MambaByte: Token-free Selective State Space Model" in Pytorch and Zeta☆126Updated last month
- FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores☆333Updated 11 months ago
- ☆13Updated 2 years ago
- Unofficial PyTorch Implementation for pNLP-Mixer: an Efficient all-MLP Architecture for Language (https://arxiv.org/abs/2202.04350)☆65Updated 3 years ago
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆131Updated 3 weeks ago
- Huggingface compatible implementation of RetNet (Retentive Networks, https://arxiv.org/pdf/2307.08621.pdf) including parallel, recurrent,…☆227Updated last year
- ☆166Updated last year
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"☆243Updated 5 months ago
- ☆70Updated last year
- A simple but robust PyTorch implementation of RetNet from "Retentive Network: A Successor to Transformer for Large Language Models" (http…☆106Updated 2 years ago
- A converter and basic tester for rwkv onnx☆43Updated last year
- Root Mean Square Layer Normalization☆258Updated 2 years ago
- Get down and dirty with FlashAttention2.0 in pytorch, plug in and play no complex CUDA kernels☆112Updated 2 years ago