augustwester / transformer-xl
A lightweight PyTorch implementation of the Transformer-XL architecture proposed by Dai et al. (2019)
☆37Updated last year
Alternatives and similar repositories for transformer-xl:
Users that are interested in transformer-xl are comparing it to the libraries listed below
- Minimal but scalable implementation of large language models in JAX☆28Updated 2 months ago
- ☆53Updated last year
- Machine Learning eXperiment Utilities☆45Updated 7 months ago
- Q-Probe: A Lightweight Approach to Reward Maximization for Language Models☆40Updated 7 months ago
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆36Updated last year
- Implementation of Direct Preference Optimization☆15Updated last year
- Learn online intrinsic rewards from LLM feedback☆34Updated last month
- HomebrewNLP in JAX flavour for maintable TPU-Training☆47Updated last year
- ☆33Updated 4 months ago
- ☆41Updated last year
- [ICML 2024] Official code release accompanying the paper "diff History for Neural Language Agents" (Piterbarg, Pinto, Fergus)☆20Updated 5 months ago
- ☆30Updated 2 months ago
- Latent Diffusion Language Models☆68Updated last year
- Efficient World Models with Context-Aware Tokenization. ICML 2024☆89Updated 4 months ago
- ☆76Updated 6 months ago
- some common Huggingface transformers in maximal update parametrization (µP)☆79Updated 2 years ago
- Official code for the paper "Context-Aware Language Modeling for Goal-Oriented Dialogue Systems"☆34Updated 2 years ago
- Repository for the code of the "PPL-MCTS: Constrained Textual Generation Through Discriminator-Guided Decoding" paper, NAACL'22☆65Updated 2 years ago
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training☆121Updated 9 months ago
- Official Implementation of NeurIPS'23 Paper "Cross-Episodic Curriculum for Transformer Agents"☆31Updated last year
- JAX/Flax implementation of the Hyena Hierarchy☆33Updated last year
- Implementation of GateLoop Transformer in Pytorch and Jax☆87Updated 7 months ago
- ☆60Updated last year
- Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models☆47Updated 7 months ago
- Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing☆47Updated 3 years ago
- Transformer with Mu-Parameterization, implemented in Jax/Flax. Supports FSDP on TPU pods.☆30Updated last month
- ☆75Updated 6 months ago
- A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks☆28Updated 2 months ago
- Code for the paper "The Impact of Positional Encoding on Length Generalization in Transformers", NeurIPS 2023☆130Updated 9 months ago