DonRL10 / RetNet
an implementation of paper"Retentive Network: A Successor to Transformer for Large Language Models" https://arxiv.org/pdf/2307.08621.pdf
☆12Updated last year
Related projects: ⓘ
- [NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Se…☆60Updated 4 months ago
- ☆41Updated 2 months ago
- ☆24Updated 2 months ago
- ☆30Updated 8 months ago
- Official Repository for Efficient Linear-Time Attention Transformers.☆17Updated 3 months ago
- ☆65Updated 3 weeks ago
- ☆22Updated 3 months ago
- HGRN2: Gated Linear RNNs with State Expansion☆46Updated last month
- Linear Attention Sequence Parallelism (LASP)☆64Updated 3 months ago
- GoldFinch and other hybrid transformer components☆38Updated 2 months ago
- Implementation of Griffin from the paper: "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"☆48Updated last week
- ☆19Updated last month
- Jax implementation of "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"☆12Updated 4 months ago
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆34Updated 10 months ago
- A large-scale RWKV v6 inference with FLA . Capable of inference by combining multiple states(Pseudo MoE). Easy to deploy on docker. Suppo…☆15Updated 2 weeks ago
- ☆16Updated last year
- A byte-level decoder architecture that matches the performance of tokenized Transformers.☆57Updated 4 months ago
- ☆42Updated 7 months ago
- Scaling Sparse Fine-Tuning to Large Language Models☆17Updated 7 months ago
- An unofficial implementation of the Infini-gram model proposed by Liu et al. (2024)☆20Updated 3 months ago
- Efficient PScan implementation in PyTorch☆15Updated 8 months ago
- PyTorch implementation of Retentive Network: A Successor to Transformer for Large Language Models☆16Updated last year
- Contextual Position Encoding but with some custom CUDA Kernels https://arxiv.org/abs/2405.18719☆18Updated 3 months ago
- ☆30Updated 3 months ago
- ☆50Updated last month
- Randomized Positional Encodings Boost Length Generalization of Transformers☆78Updated 6 months ago
- Implementation of "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"☆42Updated last week
- Triton Implementation of HyperAttention Algorithm☆46Updated 9 months ago
- [NeurIPS 2023] Sparse Modular Activation for Efficient Sequence Modeling☆34Updated 9 months ago
- ☆45Updated 7 months ago