vmarinowski / infini-attentionView external linksLinks
An unofficial pytorch implementation of 'Efficient Infinite Context Transformers with Infini-attention'
☆54Aug 19, 2024Updated last year
Alternatives and similar repositories for infini-attention
Users that are interested in infini-attention are comparing it to the libraries listed below
Sorting:
- Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…☆58Feb 9, 2026Updated last week
- Efficient Infinite Context Transformers with Infini-attention Pytorch Implementation + QwenMoE Implementation + Training Script + 1M cont…☆86May 9, 2024Updated last year
- PyTorch implementation of Infini-Transformer from "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention…☆294May 4, 2024Updated last year
- A simple and minimal open source implementation of "Introducing LFM2: The Fastest On-Device Foundation Models on the Market" from Liquid …☆21Feb 9, 2026Updated last week
- Implementation of the LDP module block in PyTorch and Zeta from the paper: "MobileVLM: A Fast, Strong and Open Vision Language Assistant …☆15Mar 11, 2024Updated last year
- replacement of AdamW and Lion optimizer for LLMs☆13May 28, 2023Updated 2 years ago
- A sleek, customizable interface for managing LLMs with responsive design and easy agent personalization.☆17Aug 30, 2024Updated last year
- Community Open Source Implementation of GPT4o in PyTorch☆26Feb 9, 2026Updated last week
- Implementation of "PaLM2-VAdapter:" from the multi-modal model paper: "PaLM2-VAdapter: Progressively Aligned Language Model Makes a Stron…☆17Nov 11, 2024Updated last year
- Two implementations of ZeRO-1 optimizer sharding in JAX☆14Jun 11, 2023Updated 2 years ago
- Compute WER and SER for speech recognition evaluation☆26Dec 15, 2025Updated 2 months ago
- A byte-level decoder architecture that matches the performance of tokenized Transformers.☆67Apr 24, 2024Updated last year
- An implementation of the base GPT-3 Model architecture from the paper by OPENAI "Language Models are Few-Shot Learners"☆20Jun 29, 2024Updated last year
- This is a simple torch implementation of the high performance Multi-Query Attention☆16Aug 23, 2023Updated 2 years ago
- Automatically remove watermarks from illustrations using AI (Stable Diffusion).☆20Dec 17, 2024Updated last year
- Proteus is an experimental platform that combines the power of Large Language Models with the Genesis physics engine☆25Dec 20, 2024Updated last year
- Fine-tune OpenAI models with your Discord chat history☆27Jul 29, 2025Updated 6 months ago
- ☆24Dec 16, 2024Updated last year
- Create transparent image with Diffusers!☆59Feb 4, 2025Updated last year
- Implementation of the LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper☆150Jul 20, 2024Updated last year
- Playing with Agents☆36Jan 17, 2025Updated last year
- Pytorch Implementation of the paper: "Learning to (Learn at Test Time): RNNs with Expressive Hidden States"☆25Feb 9, 2026Updated last week
- A super simple web interface to perform blind tests on LLM outputs.☆29Mar 9, 2024Updated last year
- Pytorch Implementation of the Model from "MIRASOL3B: A MULTIMODAL AUTOREGRESSIVE MODEL FOR TIME-ALIGNED AND CONTEXTUAL MODALITIES"☆26Jan 27, 2025Updated last year
- LCM Full Cycle Trainer for Ostris - Ai Toolkit☆16Aug 20, 2024Updated last year
- An open source replication of the stawberry method that leverages Monte Carlo Search with PPO and or DPO☆28Feb 10, 2026Updated last week
- This repository contains code for cleaning your training data of benchmark data to help combat data snooping.☆27Apr 21, 2023Updated 2 years ago
- EvaByte: Efficient Byte-level Language Models at Scale☆115Apr 22, 2025Updated 9 months ago
- Gradio Client in Rust.☆28Nov 30, 2025Updated 2 months ago
- Implementation of the model: "Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models" in PyTorch☆28Feb 9, 2026Updated last week
- [Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.☆113Jul 27, 2024Updated last year
- A chess arena for large language models☆39May 22, 2025Updated 8 months ago
- Pytorch implementation of "Oscillation-Reduced MXFP4 Training for Vision Transformers" on DeiT Model Pre-training☆36Jun 20, 2025Updated 7 months ago
- Unofficial API Wrapper for Deepseek (chat.deepseek.com)☆72Aug 5, 2025Updated 6 months ago
- Gibsonify — Collect nutritional data using Gibson's method!☆11Oct 28, 2023Updated 2 years ago
- Gemma 2B with 10M context length using Infini-attention.☆935May 12, 2024Updated last year
- Video+code lecture on building nanoGPT from scratch☆68Jun 14, 2024Updated last year
- Implementation of Google's USM speech model in Pytorch☆34Feb 7, 2026Updated last week
- Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)☆209May 20, 2024Updated last year