fire / pytorch-nncpLinks
☆11Updated 2 years ago
Alternatives and similar repositories for pytorch-nncp
Users that are interested in pytorch-nncp are comparing it to the libraries listed below
Sorting:
- This repository contains the source code and dataset link mentioned in WWW 2022 accepted paper "TRACE:A Fast Transformer-based General-Pu…☆30Updated 3 years ago
- Dzip: improved general-purpose lossless compression based on novel neural network modeling☆72Updated 3 years ago
- ☆60Updated 7 months ago
- QuIP quantization☆58Updated last year
- NN based lossless compression☆160Updated 3 years ago
- ☆69Updated last year
- An implementation of LLMzip using GPT-2☆13Updated 2 years ago
- ☆55Updated last week
- Implementation of Memory-Compressed Attention, from the paper "Generating Wikipedia By Summarizing Long Sequences"☆70Updated 2 years ago
- Bamboo-7B Large Language Model☆93Updated last year
- Training a reward model for RLHF using RWKV.☆15Updated 2 years ago
- Some preliminary explorations of Mamba's context scaling.☆14Updated 8 months ago
- Experimental playground for benchmarking language model (LM) architectures, layers, and tricks on smaller datasets. Designed for flexible…☆77Updated last week
- ☆54Updated 2 months ago
- GoldFinch and other hybrid transformer components☆11Updated last month
- Official implementation of the paper "Linear Transformers with Learnable Kernel Functions are Better In-Context Models"☆162Updated 7 months ago
- Repository for CPU Kernel Generation for LLM Inference☆26Updated 2 years ago
- SparseGPT + GPTQ Compression of LLMs like LLaMa, OPT, Pythia☆41Updated 2 years ago
- RWKV-7: Surpassing GPT☆94Updated 9 months ago
- ☆13Updated 2 years ago
- Framework agnostic python runtime for RWKV models☆146Updated 2 years ago
- Simple high-throughput inference library☆127Updated 3 months ago
- Faster distil-whisper transcription with CTranslate2☆14Updated last year
- [EMNLP Main '25] LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation☆122Updated 3 months ago
- Inference code for LLaMA models☆42Updated 2 years ago
- Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees" adapted for Llama models☆39Updated 2 years ago
- Official repository for the paper "NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks". This rep…☆59Updated 10 months ago
- An unofficial pytorch implementation of 'Efficient Infinite Context Transformers with Infini-attention'☆52Updated last year
- A high-throughput and memory-efficient inference and serving engine for Whisper, https://mesolitica.com/blog/vllm-whisper☆30Updated last year
- openvino version of openai/whisper☆174Updated last year