vmarinowski / infini-attention
An unofficial pytorch implementation of 'Efficient Infinite Context Transformers with Infini-attention'
☆52Updated 8 months ago
Alternatives and similar repositories for infini-attention:
Users that are interested in infini-attention are comparing it to the libraries listed below
- A repository for research on medium sized language models.☆76Updated 11 months ago
- Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…☆55Updated this week
- My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated☆31Updated 8 months ago
- Official implementation for 'Extending LLMs’ Context Window with 100 Samples'☆76Updated last year
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆37Updated last year
- This is the official repository for Inheritune.☆111Updated 2 months ago
- This repo is based on https://github.com/jiaweizzhao/GaLore☆26Updated 7 months ago
- RWKV-7: Surpassing GPT☆83Updated 5 months ago
- Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)☆30Updated last month
- ☆33Updated 10 months ago
- Official PyTorch implementation of LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation.☆13Updated 2 weeks ago
- ☆78Updated 8 months ago
- GoldFinch and other hybrid transformer components☆45Updated 9 months ago
- ☆50Updated 5 months ago
- Efficient Infinite Context Transformers with Infini-attention Pytorch Implementation + QwenMoE Implementation + Training Script + 1M cont…☆82Updated 11 months ago
- Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performance…☆148Updated 3 weeks ago
- QuIP quantization☆51Updated last year
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)☆152Updated 2 weeks ago
- EvaByte: Efficient Byte-level Language Models at Scale☆88Updated this week
- ☆89Updated 7 months ago
- ☆77Updated 3 months ago
- Implementation of 🥥 Coconut, Chain of Continuous Thought, in Pytorch☆165Updated 3 months ago
- Minimal implementation of the Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models paper (ArXiv 20232401.01335)☆29Updated last year
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆55Updated 7 months ago
- A byte-level decoder architecture that matches the performance of tokenized Transformers.☆63Updated last year
- The official code repo and data hub of top_nsigma sampling strategy for LLMs.☆24Updated 2 months ago
- Implementation of the Mamba SSM with hf_integration.☆56Updated 7 months ago
- The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …☆59Updated 6 months ago
- ☆47Updated 7 months ago
- [ICLR 2025] LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization☆34Updated 2 months ago