seal-rg / recurrent-pretraining
Pretraining code for a large-scale depth-recurrent language model
☆697Updated 2 weeks ago
Alternatives and similar repositories for recurrent-pretraining:
Users that are interested in recurrent-pretraining are comparing it to the libraries listed below
- Training Large Language Model to Reason in a Continuous Latent Space☆998Updated 2 months ago
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…☆310Updated 3 months ago
- Muon is Scalable for LLM Training☆974Updated last month
- Recipes to scale inference-time compute of open models☆1,044Updated last month
- ☆485Updated last week
- Build your own visual reasoning model