fire / pytorch-nncpLinks
☆11Updated 2 years ago
Alternatives and similar repositories for pytorch-nncp
Users that are interested in pytorch-nncp are comparing it to the libraries listed below
Sorting:
- This repository contains the source code and dataset link mentioned in WWW 2022 accepted paper "TRACE:A Fast Transformer-based General-Pu…☆30Updated 3 years ago
- Dzip: improved general-purpose lossless compression based on novel neural network modeling☆72Updated 3 years ago
- ☆56Updated 6 months ago
- Code for Fast as CHITA: Neural Network Pruning with Combinatorial Optimization☆12Updated last year
- ☆51Updated last year
- Tiled Flash Linear Attention library for fast and efficient mLSTM Kernels.☆64Updated last week
- Repository for CPU Kernel Generation for LLM Inference☆26Updated 2 years ago
- Here we collect trick questions and failed tasks for open source LLMs to improve them.☆32Updated 2 years ago
- Griffin MQA + Hawk Linear RNN Hybrid☆87Updated last year
- ☆69Updated last year
- Pytorch Implementation of the sparse attention from the paper: "Generating Long Sequences with Sparse Transformers"☆84Updated 3 months ago
- High-speed and easy-use LLM serving framework for local deployment☆112Updated 3 months ago
- ☆146Updated last year
- ☆147Updated 2 years ago
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆127Updated 10 months ago
- Here we will test various linear attention designs.☆60Updated last year
- 32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.☆48Updated 2 years ago
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆98Updated 9 months ago
- Bamboo-7B Large Language Model☆93Updated last year
- Efficient Infinite Context Transformers with Infini-attention Pytorch Implementation + QwenMoE Implementation + Training Script + 1M cont…☆83Updated last year
- Get down and dirty with FlashAttention2.0 in pytorch, plug in and play no complex CUDA kernels☆105Updated last year
- Image super resolution models for PyTorch.☆174Updated 2 months ago
- QuIP quantization☆54Updated last year
- Thispersondoesnotexist went down, so this time, while building it back up, I am going to open source all of it.☆90Updated last year
- ☆15Updated last year
- Implementation of Memory-Compressed Attention, from the paper "Generating Wikipedia By Summarizing Long Sequences"☆71Updated 2 years ago
- RWKV, in easy to read code☆72Updated 3 months ago
- Mamba training library developed by kotoba technologies☆71Updated last year
- sigma-MoE layer☆20Updated last year
- An implementation of LLMzip using GPT-2☆13Updated last year