google-deepmind / scaling_laws_for_routingLinks
☆13Updated 3 years ago
Alternatives and similar repositories for scaling_laws_for_routing
Users that are interested in scaling_laws_for_routing are comparing it to the libraries listed below
Sorting:
- [NeurIPS 2023] Learning Transformer Programs☆162Updated last year
- Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)☆80Updated last year
- ☆84Updated last year
- A Kernel-Based View of Language Model Fine-Tuning https://arxiv.org/abs/2210.05643☆78Updated last year
- ☆147Updated 2 years ago
- ☆83Updated last year
- ☆31Updated 2 years ago
- Official code from the paper "Offline RL for Natural Language Generation with Implicit Language Q Learning"☆209Updated 2 years ago
- Skill-It! A Data-Driven Skills Framework for Understanding and Training Language Models☆46Updated last year
- Retrieval as Attention☆83Updated 2 years ago
- ☆27Updated 6 months ago
- Code for Neural Execution Engines: Learning to Execute Subroutines☆17Updated 4 years ago
- Experiments and code to generate the GINC small-scale in-context learning dataset from "An Explanation for In-context Learning as Implici…☆108Updated last year
- [ICLR 2023] Code for our paper "Selective Annotation Makes Language Models Better Few-Shot Learners"☆108Updated 2 years ago
- Simple and efficient pytorch-native transformer training and inference (batched)☆78Updated last year
- ☆53Updated last year
- Code for the paper "The Impact of Positional Encoding on Length Generalization in Transformers", NeurIPS 2023☆136Updated last year
- This package implements THOR: Transformer with Stochastic Experts.☆65Updated 3 years ago
- ☆184Updated last year
- ☆97Updated 2 years ago
- Repository of the paper "Accelerating Transformer Inference for Translation via Parallel Decoding"☆119Updated last year
- ☆159Updated 2 years ago
- RLHF implementation details of OAI's 2019 codebase☆187Updated last year
- ☆119Updated last year
- Language models scale reliably with over-training and on downstream tasks☆97Updated last year
- Scaling Data-Constrained Language Models☆338Updated last month
- ☆84Updated 6 months ago
- ☆45Updated last year
- Simple next-token-prediction for RLHF☆227Updated last year
- The accompanying code for "Transformer Feed-Forward Layers Are Key-Value Memories". Mor Geva, Roei Schuster, Jonathan Berant, and Omer Le…☆94Updated 3 years ago