augustwester / transformer-xlLinks
A lightweight PyTorch implementation of the Transformer-XL architecture proposed by Dai et al. (2019)
☆37Updated 2 years ago
Alternatives and similar repositories for transformer-xl
Users that are interested in transformer-xl are comparing it to the libraries listed below
Sorting:
- Official code for the paper "Context-Aware Language Modeling for Goal-Oriented Dialogue Systems"☆34Updated 2 years ago
- ☆53Updated last year
- Learn online intrinsic rewards from LLM feedback☆41Updated 6 months ago
- HomebrewNLP in JAX flavour for maintable TPU-Training☆50Updated last year
- [ICML 2024] Official code release accompanying the paper "diff History for Neural Language Agents" (Piterbarg, Pinto, Fergus)☆20Updated 10 months ago
- Minimal but scalable implementation of large language models in JAX☆35Updated 7 months ago
- Triton Implementation of HyperAttention Algorithm☆48Updated last year
- JAX notebook showing how to LoRA + GPTQ arbitrary models☆10Updated last year
- ☆61Updated last year
- some common Huggingface transformers in maximal update parametrization (µP)☆81Updated 3 years ago
- Sparse and discrete interpretability tool for neural networks☆63Updated last year
- Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*☆84Updated last year
- ☆53Updated last year
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆38Updated 2 weeks ago
- ☆78Updated 11 months ago
- Scaling scaling laws with board games.☆49Updated last year
- ☆45Updated last year
- Exploring an idea where one forgets about efficiency and carries out attention across each edge of the nodes (tokens)☆51Updated 3 months ago
- PyTorch implementation for "Long Horizon Temperature Scaling", ICML 2023☆20Updated 2 years ago
- Utilities for Training Very Large Models☆58Updated 9 months ago
- Repository for the code of the "PPL-MCTS: Constrained Textual Generation Through Discriminator-Guided Decoding" paper, NAACL'22☆66Updated 2 years ago
- Latent Diffusion Language Models☆68Updated last year
- ☆34Updated 9 months ago
- Implementation of the Llama architecture with RLHF + Q-learning☆165Updated 4 months ago
- Code associated to papers on superposition (in ML interpretability)☆28Updated 2 years ago
- Inference code for LLaMA models in JAX☆118Updated last year
- Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models☆58Updated 4 months ago
- ☆49Updated last year
- Code for the paper "The Impact of Positional Encoding on Length Generalization in Transformers", NeurIPS 2023☆136Updated last year
- Train very large language models in Jax.☆205Updated last year