dingo-actual / infini-transformerLinks

PyTorch implementation of Infini-Transformer from "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" (https://arxiv.org/abs/2404.07143)

☆294

Alternatives and similar repositories for infini-transformer

Users that are interested in infini-transformer are comparing it to the libraries listed below

Sorting:

Beomi / InfiniTransformer
Unofficial PyTorch/🤗Transformers(Gemma/Llama3) implementation of Leave No Context Behind: Efficient Infinite Context Transformers with I…
☆373Updated last year
facebookresearch / memory
Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…
☆360Updated 11 months ago
sanderwood / bgpt
Beyond Language Models: Byte Models are Digital World Simulators
☆331Updated last year
itsnamgyu / block-transformer
Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)
☆162Updated 7 months ago
pratyushasharma / laser
The Truth Is In There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction
☆389Updated last year
catid / dora
Implementation of DoRA
☆307Updated last year
HazyResearch / m2
Repo for "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture"
☆561Updated 11 months ago
VITA-Group / Q-GaLore
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.
☆202Updated last year
jxiw / MambaInLlama
[NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models
☆232Updated last month
zyushun / Adam-mini
Code for Adam-mini: Use Fewer Learning Rates To Gain More https://arxiv.org/abs/2406.16793
☆445Updated 6 months ago
llm-random / llm-random
☆205Updated this week
dwzhu-pku / PoSE
Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)
☆205Updated last year
jzhang38 / LongMamba
Some preliminary explorations of Mamba's context scaling.
☆217Updated last year
lucidrains / ring-attention-pytorch
Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch
☆545Updated 6 months ago
hao-ai-lab / Consistency_LLM
[ICML 2024] CLLMs: Consistency Large Language Models
☆406Updated last year
jzhang38 / EasyContext
Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.
☆750Updated last year
yuhuixu1993 / qa-lora
Official PyTorch implementation of QA-LoRA
☆145Updated last year
HazyResearch / based
Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"
☆243Updated 5 months ago
lucidrains / coconut-pytorch
Implementation of 🥥 Coconut, Chain of Continuous Thought, in Pytorch
☆180Updated 5 months ago
astramind-ai / Mixture-of-depths
Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"
☆175Updated last year
FranxYao / Long-Context-Data-Engineering
Implementation of paper Data Engineering for Scaling Language Models to 128K Context
☆478Updated last year
jshuadvd / LongRoPE
Implementation of the LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper
☆152Updated last year
microsoft / LongRoPE
LongRoPE is a novel method that can extends the context window of pre-trained LLMs to an impressive 2048k tokens.
☆272Updated last month
XuezheMax / megalodon
Reference implementation of Megalodon 7B model
☆525Updated 6 months ago
HazyResearch / lolcats
Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"
☆249Updated 10 months ago
NVlabs / Minitron
A family of compressed models obtained via pruning and knowledge distillation
☆357Updated 3 weeks ago
lucidrains / nGPT-pytorch
Quick implementation of nGPT, learning entirely on the hypersphere, from NvidiaAI
☆293Updated 6 months ago
yxli2123 / LoftQ
☆235Updated last year
microsoft / rho
Repo for Rho-1: Token-level Data Selection & Selective Pretraining of LLMs.
☆447Updated last year
FasterDecoding / BitDelta
☆203Updated 11 months ago