lucidrains / memorizing-transformers-pytorchLinks

Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate nearest neighbors, in Pytorch

☆637

Alternatives and similar repositories for memorizing-transformers-pytorch

Users that are interested in memorizing-transformers-pytorch are comparing it to the libraries listed below

Sorting:

HazyResearch / H3
Language Modeling with the H3 State Space Model
☆519Updated 2 years ago
lucidrains / RETRO-pytorch
Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorch
☆876Updated 2 years ago
abertsch72 / unlimiformer
Public repo for the NeurIPS 2023 paper "Unlimiformer: Long-Range Transformers with Unlimited Length Input"
☆1,063Updated last year
HazyResearch / safari
Convolutions for Sequence Modeling
☆903Updated last year
lucidrains / recurrent-memory-transformer-pytorch
Implementation of Recurrent Memory Transformer, Neurips 2022 paper, in Pytorch
☆418Updated 10 months ago
JonasGeiping / cramming
Cramming the training of a (BERT-type) language model into limited compute.
☆1,352Updated last year
lucidrains / PaLM-pytorch
Implementation of the specific Transformer architecture from PaLM - Scaling Language Modeling with Pathways
☆826Updated 3 years ago
google-research / meliad
☆259Updated 5 months ago
PiotrNawrot / nanoT5
Fast & Simple repository for pre-training and fine-tuning T5-style models
☆1,014Updated last year
ofirpress / attention_with_linear_biases
Code for the ALiBi method for transformer language models (ICLR 2022)
☆546Updated 2 years ago
sanjeevanahilan / nanoChatGPT
A crude RLHF layer on top of nanoGPT with Gumbel-Softmax trick
☆293Updated 2 years ago
lucidrains / MEGABYTE-pytorch
Implementation of MEGABYTE, Predicting Million-byte Sequences with Multiscale Transformers, in Pytorch
☆654Updated 11 months ago
lucidrains / simple-hierarchical-transformer
Experiments around a simple idea for inducing multiple hierarchical predictive model within a GPT
☆224Updated last year
CarperAI / cheese
Used for adaptive human in the loop evaluation of language and embedding models.
☆308Updated 2 years ago
lucidrains / CoLT5-attention
Implementation of the conditionally routed attention in the CoLT5 architecture, in Pytorch
☆230Updated last year
HazyResearch / m2
Repo for "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture"
☆561Updated 10 months ago
r-three / t-few
Code for T-Few from "Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning"
☆456Updated 2 years ago
google / flaxformer
☆363Updated last year
HazyResearch / ama_prompting
Ask Me Anything language model prompting
☆547Updated 2 years ago
pbelcak / UltraFastBERT
The repository for the code of the UltraFastBERT paper
☆520Updated last year
alasdairforsythe / tokenmonster
Ungreedy subword tokenizer and vocabulary trainer for Python, Go & Javascript
☆606Updated last year
zphang / minimal-llama
☆457Updated 2 years ago
Liuhong99 / Sophia
The official implementation of “Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training”
☆979Updated last year
google-deepmind / tracr
☆547Updated last year
changjonathanc / minLoRA
minLoRA: a minimal PyTorch library that allows you to apply LoRA to any PyTorch model.
☆486Updated 2 years ago
persimmon-ai-labs / adept-inference
Inference code for Persimmon-8B
☆412Updated 2 years ago
rom1504 / cc2dataset
Easily convert common crawl to a dataset of caption and document. Image/text Audio/text Video/text, ...
☆321Updated last year
kyegomez / Sophia
Effortless plugin and play Optimizer to cut model training costs by 50%. New optimizer that is 2x faster than Adam on LLMs.
☆384Updated last year
epfml / landmark-attention
Landmark Attention: Random-Access Infinite Context Length for Transformers
☆427Updated last year
facebookresearch / mega
Sequence modeling with Mega.
☆301Updated 2 years ago