booydar / LM-RMTLinks

Recurrent Memory Transformer

☆154

Alternatives and similar repositories for LM-RMT

Users that are interested in LM-RMT are comparing it to the libraries listed below

Sorting:

McGill-NLP / length-generalization
Code for the paper "The Impact of Positional Encoding on Length Generalization in Transformers", NeurIPS 2023
☆138Updated last year
google-deepmind / randomized_positional_encodings
Randomized Positional Encodings Boost Length Generalization of Transformers
☆83Updated last year
lucidrains / recurrent-memory-transformer-pytorch
Implementation of Recurrent Memory Transformer, Neurips 2022 paper, in Pytorch
☆420Updated 10 months ago
deep-spin / infinite-former
☆67Updated last year
huggingface / datablations
Scaling Data-Constrained Language Models
☆342Updated 5 months ago
lucidrains / CoLT5-attention
Implementation of the conditionally routed attention in the CoLT5 architecture, in Pytorch
☆230Updated last year
lucidrains / CALM-pytorch
Implementation of CALM from the paper "LLM Augmented LLMs: Expanding Capabilities through Composition", out of Google Deepmind
☆178Updated last year
lucidrains / simple-hierarchical-transformer
Experiments around a simple idea for inducing multiple hierarchical predictive model within a GPT
☆224Updated last year
microsoft / AdaMix
This is the implementation of the paper AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning (https://arxiv.org/abs/2205.1…
☆136Updated 2 years ago
tianjunz / HIR
☆159Updated 2 years ago
Dahoas / reward-modeling
☆98Updated 2 years ago
haoliuhl / chain-of-hindsight
Simple next-token-prediction for RLHF
☆227Updated 2 years ago
PiotrNawrot / dynamic-pooling
Efficient Transformers with Dynamic Token Pooling
☆65Updated 2 years ago
imoneoi / multipack
Multipack distributed sampler for fast padding-free training of LLMs
☆202Updated last year
p-lambda / dsir
DSIR large-scale data selection framework for language model training
☆266Updated last year
IBM / SALMON
Self-Alignment with Principle-Following Reward Models
☆169Updated 2 months ago
HazyResearch / zoology
Understand and test language model architectures on synthetic tasks.
☆240Updated 2 months ago
jzhang38 / LongMamba
Some preliminary explorations of Mamba's context scaling.
☆217Updated last year
facebookresearch / SemDeDup
Code for "SemDeDup", a simple method for identifying and removing semantic duplicates from a dataset (data pairs which are semantically s…
☆147Updated 2 years ago
lucidrains / coconut-pytorch
Implementation of 🥥 Coconut, Chain of Continuous Thought, in Pytorch
☆180Updated 5 months ago
dwzhu-pku / PoSE
Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)
☆205Updated last year
thomfoster / minRLHF
A (somewhat) minimal library for finetuning language models with PPO on human feedback.
☆87Updated 3 years ago
facebookresearch / mega
Sequence modeling with Mega.
☆301Updated 2 years ago
epfml / dynamic-sparse-flash-attention
☆150Updated 2 years ago
LLM360 / amber-train
Pre-training code for Amber 7B LLM
☆169Updated last year
itsnamgyu / block-transformer
Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)
☆162Updated 7 months ago
mlfoundations / scaling
Language models scale reliably with over-training and on downstream tasks
☆100Updated last year
HazyResearch / TART
TART: A plug-and-play Transformer module for task-agnostic reasoning
☆201Updated 2 years ago
srush / do-we-need-attention
☆166Updated 2 years ago
OpenNLPLab / Tnn
[ICLR 2023] Official implementation of Transnormer in our ICLR 2023 paper - Toeplitz Neural Network for Sequence Modeling
☆80Updated last year