fire / pytorch-nncpLinks
☆11Updated 2 years ago
Alternatives and similar repositories for pytorch-nncp
Users that are interested in pytorch-nncp are comparing it to the libraries listed below
Sorting:
- This repository contains the source code and dataset link mentioned in WWW 2022 accepted paper "TRACE:A Fast Transformer-based General-Pu…☆30Updated 3 years ago
- Dzip: improved general-purpose lossless compression based on novel neural network modeling☆72Updated 3 years ago
- ☆58Updated 6 months ago
- An implementation of LLMzip using GPT-2☆13Updated 2 years ago
- A converter and basic tester for rwkv onnx☆42Updated last year
- Implementation of Google's USM speech model in Pytorch☆31Updated 3 weeks ago
- GoldFinch and other hybrid transformer components☆11Updated last month
- ☆150Updated last year
- ☆95Updated last year
- ☆69Updated last year
- Efficient Infinite Context Transformers with Infini-attention Pytorch Implementation + QwenMoE Implementation + Training Script + 1M cont…☆83Updated last year
- Experimental playground for benchmarking language model (LM) architectures, layers, and tricks on smaller datasets. Designed for flexible…☆74Updated 3 weeks ago
- An unofficial pytorch implementation of 'Efficient Infinite Context Transformers with Infini-attention'☆53Updated 11 months ago
- Repository for CPU Kernel Generation for LLM Inference☆26Updated 2 years ago
- Large Scale Distributed Model Training strategy with Colossal AI and Lightning AI☆56Updated last year
- Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees" adapted for Llama models☆38Updated 2 years ago
- Low-bit optimizers for PyTorch☆130Updated last year
- PyTorch implementation of Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation☆29Updated 3 years ago
- QuIP quantization☆55Updated last year
- LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation☆118Updated 2 months ago
- Framework agnostic python runtime for RWKV models☆146Updated last year
- Implementation of Memory-Compressed Attention, from the paper "Generating Wikipedia By Summarizing Long Sequences"☆70Updated 2 years ago
- (Unofficial) Implementation of dilated attention from "LongNet: Scaling Transformers to 1,000,000,000 Tokens" (https://arxiv.org/abs/2307…☆53Updated 2 years ago
- SparseGPT + GPTQ Compression of LLMs like LLaMa, OPT, Pythia☆41Updated 2 years ago
- Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts☆120Updated 9 months ago
- [ACL 2025 Main] Repository for the paper: 500xCompressor: Generalized Prompt Compression for Large Language Models☆42Updated 2 months ago
- High-speed and easy-use LLM serving framework for local deployment☆115Updated this week
- The Open Parallel Corpus☆75Updated 3 weeks ago
- 3x Faster Inference; Unofficial implementation of EAGLE Speculative Decoding☆75Updated last month
- some common Huggingface transformers in maximal update parametrization (µP)☆82Updated 3 years ago