An unofficial pytorch implementation of 'Efficient Infinite Context Transformers with Infini-attention'
☆55Aug 19, 2024Updated last year
Alternatives and similar repositories for infini-attention
Users that are interested in infini-attention are comparing it to the libraries listed below
Sorting:
- Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…☆58Feb 9, 2026Updated last month
- Efficient Infinite Context Transformers with Infini-attention Pytorch Implementation + QwenMoE Implementation + Training Script + 1M cont…☆86May 9, 2024Updated last year
- PyTorch implementation of Infini-Transformer from "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention…☆297May 4, 2024Updated last year
- Unofficial PyTorch/🤗Transformers(Gemma/Llama3) implementation of Leave No Context Behind: Efficient Infinite Context Transformers with I…☆375Apr 23, 2024Updated last year
- Implementation of the LDP module block in PyTorch and Zeta from the paper: "MobileVLM: A Fast, Strong and Open Vision Language Assistant …☆15Mar 11, 2024Updated last year
- A simple and minimal open source implementation of "Introducing LFM2: The Fastest On-Device Foundation Models on the Market" from Liquid …☆23Mar 2, 2026Updated last week
- A sleek, customizable interface for managing LLMs with responsive design and easy agent personalization.☆17Aug 30, 2024Updated last year
- replacement of AdamW and Lion optimizer for LLMs☆13May 28, 2023Updated 2 years ago
- Community Open Source Implementation of GPT4o in PyTorch☆26Feb 9, 2026Updated last month
- Implementation of "PaLM2-VAdapter:" from the multi-modal model paper: "PaLM2-VAdapter: Progressively Aligned Language Model Makes a Stron…☆17Nov 11, 2024Updated last year
- ATLAS is a sophisticated real-time risk analysis system designed for institutional-grade market risk assessment. Built with high-frequenc…☆17Jan 13, 2025Updated last year
- Two implementations of ZeRO-1 optimizer sharding in JAX☆14Jun 11, 2023Updated 2 years ago
- Compute WER and SER for speech recognition evaluation☆26Dec 15, 2025Updated 2 months ago
- This is a simple torch implementation of the high performance Multi-Query Attention☆16Aug 23, 2023Updated 2 years ago
- An implementation of the base GPT-3 Model architecture from the paper by OPENAI "Language Models are Few-Shot Learners"☆20Jun 29, 2024Updated last year
- PyTorch Implementation of the paper "MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training"☆26Feb 9, 2026Updated last month
- Automatically remove watermarks from illustrations using AI (Stable Diffusion).☆20Dec 17, 2024Updated last year
- Video Diffusion State Space Models☆19Mar 27, 2024Updated last year
- Proteus is an experimental platform that combines the power of Large Language Models with the Genesis physics engine☆26Dec 20, 2024Updated last year
- Another implementation of Hinton's capsule networks in tensorflow.☆19Feb 19, 2018Updated 8 years ago
- Implementation of the LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper☆150Jul 20, 2024Updated last year
- Playing with Agents☆37Jan 17, 2025Updated last year
- Pytorch Implementation of the Model from "MIRASOL3B: A MULTIMODAL AUTOREGRESSIVE MODEL FOR TIME-ALIGNED AND CONTEXTUAL MODALITIES"☆26Jan 27, 2025Updated last year
- Pytorch Implementation of the paper: "Learning to (Learn at Test Time): RNNs with Expressive Hidden States"☆25Feb 13, 2026Updated 3 weeks ago
- ☆119Dec 18, 2024Updated last year
- Evaluate your agent memory on real-world dialogues, not LLM-simulated dialogues.☆39Jul 3, 2025Updated 8 months ago
- An open source replication of the stawberry method that leverages Monte Carlo Search with PPO and or DPO☆29Mar 3, 2026Updated last week
- EvaByte: Efficient Byte-level Language Models at Scale☆115Apr 22, 2025Updated 10 months ago
- Gradio Client in Rust.☆28Nov 30, 2025Updated 3 months ago
- This repository contains code for cleaning your training data of benchmark data to help combat data snooping.☆27Apr 21, 2023Updated 2 years ago
- [Neurips 24' D&B] Official Dataloader and Evaluation Scripts for LongVideoBench.☆113Jul 27, 2024Updated last year
- Pytorch implementation of "Oscillation-Reduced MXFP4 Training for Vision Transformers" on DeiT Model Pre-training☆36Jun 20, 2025Updated 8 months ago
- A chess arena for large language models☆39May 22, 2025Updated 9 months ago
- [ICCV 2025] Dynamic-VLM☆28Dec 16, 2024Updated last year
- Modification of daveshap/ChromaDB_Chatbot_Public that allows for end-users to customize the behavior/memories of the chatbot☆13Jun 30, 2023Updated 2 years ago
- Gibsonify — Collect nutritional data using Gibson's method!☆11Oct 28, 2023Updated 2 years ago
- Gemma 2B with 10M context length using Infini-attention.☆936May 12, 2024Updated last year
- Video+code lecture on building nanoGPT from scratch☆68Jun 14, 2024Updated last year
- ☆117Jan 16, 2026Updated last month