augustwester / transformer-xlLinks

A lightweight PyTorch implementation of the Transformer-XL architecture proposed by Dai et al. (2019)

☆37

Alternatives and similar repositories for transformer-xl

Users that are interested in transformer-xl are comparing it to the libraries listed below

Sorting:

Sea-Snell / JAXSeq
Train very large language models in Jax.
☆209Updated 2 years ago
sholtodouglas / scalingExperiments
☆62Updated 3 years ago
dvruette / barrel-rec-pytorch
☆53Updated last year
xrsrke / pipegoose
Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*
☆87Updated last year
Sea-Snell / CALM-Dialogue
Official code for the paper "Context-Aware Language Modeling for Goal-Oriented Dialogue Systems"
☆34Updated 2 years ago
HomebrewML / Olmax
HomebrewNLP in JAX flavour for maintable TPU-Training
☆51Updated last year
NohTow / PPL-MCTS
Repository for the code of the "PPL-MCTS: Constrained Textual Generation Through Discriminator-Guided Decoding" paper, NAACL'22
☆66Updated 3 years ago
hundredblocks / large-model-parallelism
Functional local implementations of main model parallelism approaches
☆96Updated 2 years ago
microsoft / mutransformers
some common Huggingface transformers in maximal update parametrization (µP)
☆86Updated 3 years ago
irhum / hyena
JAX/Flax implementation of the Hyena Hierarchy
☆34Updated 2 years ago
Sea-Snell / Implicit-Language-Q-Learning
Official code from the paper "Offline RL for Natural Language Generation with Implicit Language Q Learning"
☆209Updated 2 years ago
Cornell-RL / tril
☆128Updated last year
CEC-Agent / CEC
Official Implementation of NeurIPS'23 Paper "Cross-Episodic Curriculum for Transformer Agents"
☆31Updated 2 years ago
okarthikb / DPO
Implementation of Direct Preference Optimization
☆16Updated 2 years ago
lucidrains / PaLM-jax
Implementation of the specific Transformer architecture from PaLM - Scaling Language Modeling with Pathways - in Jax (Equinox framework)
☆188Updated 3 years ago
microsoft / greenlands
Platform to run interactive Reinforcement Learning agents in a Minecraft Server
☆53Updated last year
ml-jku / LRAM
A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks
☆35Updated 11 months ago
google-research / jestimator
Amos optimizer with JEstimator lib.
☆82Updated last year
Sea-Snell / JAX_llama
Inference code for LLaMA models in JAX
☆119Updated last year
cloneofsimo / min-max-gpt
Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training
☆132Updated last year
vvvm23 / mamba-jax
Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAX
☆88Updated last year
princeton-nlp / TransformerPrograms
[NeurIPS 2023] Learning Transformer Programs
☆162Updated last year
huggingface / simulate
🎢 Creating and sharing simulation environments for embodied and synthetic data research
☆190Updated 2 years ago
mgrankin / minGPT
minGPT in JAX
☆48Updated 3 years ago
CarperAI / Algorithm-Distillation-RLHF
☆36Updated 2 years ago
keyonvafa / world-model-evaluation
☆68Updated 11 months ago
NousResearch / StripedHyenaTrainer
☆61Updated last year
radarFudan / mamba-minimal-jax
☆34Updated 11 months ago
lucidrains / pause-transformer
Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount…
☆52Updated 2 years ago
ayaka14732 / llama-2-jax
JAX implementation of the Llama 2 model
☆216Updated last year