google-deepmind / scaling_laws_for_routing
☆13Updated 2 years ago
Alternatives and similar repositories for scaling_laws_for_routing:
Users that are interested in scaling_laws_for_routing are comparing it to the libraries listed below
- A Kernel-Based View of Language Model Fine-Tuning https://arxiv.org/abs/2210.05643☆76Updated last year
- ☆84Updated last year
- ☆82Updated 8 months ago
- ☆39Updated 2 years ago
- Simple and efficient pytorch-native transformer training and inference (batched)☆73Updated last year
- A toolkit for scaling law research ⚖☆49Updated 3 months ago
- [NeurIPS 2023] Learning Transformer Programs☆159Updated 11 months ago
- ☆49Updated last year
- ☆45Updated this week
- Reproduction of "RLCD Reinforcement Learning from Contrast Distillation for Language Model Alignment☆68Updated last year
- ☆13Updated 10 months ago
- ☆20Updated 2 months ago
- Experiments and code to generate the GINC small-scale in-context learning dataset from "An Explanation for In-context Learning as Implici…☆105Updated last year
- Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)☆80Updated last year
- Pile Deduplication Code☆17Updated last year
- Offical code of the paper Large Language Models Are Implicitly Topic Models: Explaining and Finding Good Demonstrations for In-Context Le…☆73Updated last year
- ☆45Updated last year
- ☆31Updated last year
- ☆49Updated last year
- Repo for ICML23 "Why do Nearest Neighbor Language Models Work?"☆56Updated 2 years ago
- ☆51Updated 11 months ago
- ☆54Updated last year
- The repository contains code for Adaptive Data Optimization☆24Updated 4 months ago
- ☆47Updated last year
- Official github repo for the paper "Compression Represents Intelligence Linearly" [COLM 2024]☆132Updated 7 months ago
- Code for Neural Execution Engines: Learning to Execute Subroutines☆17Updated 4 years ago
- ☆67Updated 2 years ago
- ☆33Updated last year
- The accompanying code for "Transformer Feed-Forward Layers Are Key-Value Memories". Mor Geva, Roei Schuster, Jonathan Berant, and Omer Le…☆90Updated 3 years ago
- ☆64Updated 7 months ago