juvi21 / CoPE-cuda
Contextual Position Encoding but with some custom CUDA Kernels https://arxiv.org/abs/2405.18719
☆22Updated 9 months ago
Alternatives and similar repositories for CoPE-cuda:
Users that are interested in CoPE-cuda are comparing it to the libraries listed below
- Linear Attention Sequence Parallelism (LASP)☆79Updated 9 months ago
- Code for preprint "Metadata Conditioning Accelerates Language Model Pre-training (MeCo)"☆36Updated 2 months ago
- ☆18Updated 9 months ago
- ☆51Updated 8 months ago
- An Experiment on Dynamic NTK Scaling RoPE☆62Updated last year
- DPO, but faster 🚀☆40Updated 3 months ago
- Here we will test various linear attention designs.☆59Updated 10 months ago
- ☆100Updated last year
- 32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.☆46Updated last year
- ☆30Updated 9 months ago
- Implementation of the model: "Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models" in PyTorch☆29Updated last month
- Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"☆25Updated 10 months ago
- [ICML 2023] "Data Efficient Neural Scaling Law via Model Reusing" by Peihao Wang, Rameswar Panda, Zhangyang Wang☆14Updated last year
- My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated☆31Updated 6 months ago
- Train, tune, and infer Bamba model☆86Updated last month
- ☆22Updated last year
- ☆14Updated 2 years ago
- Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…☆53Updated last month
- [NeurIPS 2023] Sparse Modular Activation for Efficient Sequence Modeling☆36Updated last year
- GoldFinch and other hybrid transformer components☆44Updated 7 months ago
- ☆31Updated 8 months ago
- ☆33Updated last year
- From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients. Ajay Jaiswal, Lu Yin, Zhenyu Zhang, Shiwei Liu,…☆44Updated 7 months ago
- ☆20Updated 2 years ago
- ☆46Updated last year
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆40Updated last year
- Implementation of "Decoding-time Realignment of Language Models", ICML 2024.☆18Updated 8 months ago
- Triton Implementation of HyperAttention Algorithm☆47Updated last year
- ☆47Updated last year