jlamprou / Infini-AttentionView external linksLinks
Efficient Infinite Context Transformers with Infini-attention Pytorch Implementation + QwenMoE Implementation + Training Script + 1M context keypass retrieval
☆86May 9, 2024Updated last year
Alternatives and similar repositories for Infini-Attention
Users that are interested in Infini-Attention are comparing it to the libraries listed below
Sorting:
- Unofficial PyTorch/🤗Transformers(Gemma/Llama3) implementation of Leave No Context Behind: Efficient Infinite Context Transformers with I…☆374Apr 23, 2024Updated last year
- PyTorch implementation of Infini-Transformer from "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention…☆294May 4, 2024Updated last year
- An unofficial pytorch implementation of 'Efficient Infinite Context Transformers with Infini-attention'☆54Aug 19, 2024Updated last year
- Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…☆58Updated this week
- Pytorch implementation of https://arxiv.org/html/2404.07143v1☆21Apr 13, 2024Updated last year
- An unofficial implementation of "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆36Jun 7, 2024Updated last year
- DPO, but faster 🚀☆47Dec 6, 2024Updated last year
- Implementation of a Hierarchical Mamba as described in the paper: "Hierarchical State Space Models for Continuous Sequence-to-Sequence Mo…☆15Nov 11, 2024Updated last year
- To mitigate position bias in LLMs, especially in long-context scenarios, we scale only one dimension of LLMs, reducing position bias and …☆11Jun 18, 2024Updated last year
- ☆16Jul 29, 2025Updated 6 months ago
- an implementation of paper"Retentive Network: A Successor to Transformer for Large Language Models" https://arxiv.org/pdf/2307.08621.pdf☆11Jul 25, 2023Updated 2 years ago
- BlockRank makes LLMs efficient and scalable for RAG and in-context ranking☆41Dec 12, 2025Updated 2 months ago
- Offical code repository for PromptMix: A Class Boundary Augmentation Method for Large Language Model Distillation, EMNLP 2023☆12Dec 13, 2023Updated 2 years ago
- [ISBI 2024] Semi-supervised Medical Image Segmentation Method Based on Cross-pseudo Labeling Leveraging Strong and Weak Data Augmentation…☆16Feb 23, 2025Updated 11 months ago
- Official code for the NeurIPS 2025 Paper: C3Po: Cross-View Cross-Modality Correspondence by Pointmap Prediction☆23Jan 27, 2026Updated 2 weeks ago
- Trying to deconstruct RWKV in understandable terms☆14May 6, 2023Updated 2 years ago
- Notebooks for docarray, Jina, Finetuner, and other products from Jina AI☆12Mar 31, 2022Updated 3 years ago
- An implementation of LazyLLM token pruning for LLaMa 2 model family.☆13Jan 6, 2025Updated last year
- [Oral; Neurips OPT2024 ] μLO: Compute-Efficient Meta-Generalization of Learned Optimizers☆14Mar 18, 2025Updated 10 months ago
- This repository contains the code for the paper "TaylorShift: Shifting the Complexity of Self-Attention from Squared to Linear (and Back)…☆13Mar 17, 2025Updated 10 months ago
- [ICML'24] Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"☆445Oct 16, 2024Updated last year
- [CVPR 2025] Official Implementation of LOCORE: Image Re-ranking with Long-Context☆15Apr 15, 2025Updated 10 months ago
- An unofficial implementation of SOLAR-10.7B model and the newly proposed interlocked-DUS(iDUS) implementation and experiment details.☆14Mar 20, 2024Updated last year
- Code for Blog Post: Can Better Cold-Start Strategies Improve RL Training for LLMs?☆19Mar 9, 2025Updated 11 months ago
- Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)☆35Mar 7, 2025Updated 11 months ago
- The repository of paper Personalized Multimodal Response Generation with Large Language Models☆17Jun 28, 2024Updated last year
- Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models☆341Feb 23, 2025Updated 11 months ago
- Set of scripts to finetune LLMs☆38Mar 30, 2024Updated last year
- Code for my ICLR 2024 TinyPapers paper "Prune and Tune: Improving Efficient Pruning Techniques for Massive Language Models"☆16May 26, 2023Updated 2 years ago
- zero-vocab or low-vocab embeddings☆18Jul 17, 2022Updated 3 years ago
- A token pruning method that accelerates ViTs for various tasks while maintaining high performance.☆28Jul 21, 2025Updated 6 months ago
- Neo4j Cybersecurity Demo☆17Mar 16, 2022Updated 3 years ago
- Training GPTs to solve interaction nets☆18Aug 14, 2024Updated last year
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"☆41Oct 11, 2024Updated last year
- Data visualizations for biomolecular dynamics☆17Oct 30, 2018Updated 7 years ago
- The official implementation of ICLR 2025 paper "Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models".☆18Apr 25, 2025Updated 9 months ago
- Odysseus: Playground of LLM Sequence Parallelism☆79Jun 17, 2024Updated last year
- LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models☆79Oct 16, 2024Updated last year
- ☆14Jul 26, 2023Updated 2 years ago