google-deepmind / scaling_laws_for_routingLinks

☆13

Alternatives and similar repositories for scaling_laws_for_routing

Users that are interested in scaling_laws_for_routing are comparing it to the libraries listed below

Sorting:

princeton-nlp / TransformerPrograms
[NeurIPS 2023] Learning Transformer Programs
☆162Updated last year
JeanKaddour / NoTrainNoGain
Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)
☆80Updated last year
google-deepmind / emergent_in_context_learning
☆84Updated last year
princeton-nlp / LM-Kernel-FT
A Kernel-Based View of Language Model Fine-Tuning https://arxiv.org/abs/2210.05643
☆78Updated last year
epfml / dynamic-sparse-flash-attention
☆147Updated 2 years ago
lee-ny / teaching_arithmetic
☆83Updated last year
Shark-NLP / CAB
☆31Updated 2 years ago
Sea-Snell / Implicit-Language-Q-Learning
Official code from the paper "Offline RL for Natural Language Generation with Implicit Language Q Learning"
☆209Updated 2 years ago
HazyResearch / skill-it
Skill-It! A Data-Driven Skills Framework for Understanding and Training Language Models
☆46Updated last year
jzbjyb / ReAtt
Retrieval as Attention
☆83Updated 2 years ago
DSL-Lab / aops
☆27Updated 6 months ago
Yujun-Yan / Neural-Execution-Engines
Code for Neural Execution Engines: Learning to Execute Subroutines
☆17Updated 4 years ago
p-lambda / incontext-learning
Experiments and code to generate the GINC small-scale in-context learning dataset from "An Explanation for In-context Learning as Implici…
☆108Updated last year
xlang-ai / icl-selective-annotation
[ICLR 2023] Code for our paper "Selective Annotation Makes Language Models Better Few-Shot Learners"
☆108Updated 2 years ago
Edward-Sun / gpt-accelera
Simple and efficient pytorch-native transformer training and inference (batched)
☆78Updated last year
swj0419 / in-context-pretraining
☆53Updated last year
McGill-NLP / length-generalization
Code for the paper "The Impact of Positional Encoding on Length Generalization in Transformers", NeurIPS 2023
☆136Updated last year
microsoft / Stochastic-Mixture-of-Experts
This package implements THOR: Transformer with Stochastic Experts.
☆65Updated 3 years ago
GFNOrg / gfn-lm-tuning
☆184Updated last year
bigscience-workshop / architecture-objective
☆97Updated 2 years ago
teelinsan / parallel-decoding
Repository of the paper "Accelerating Transformer Inference for Translation via Parallel Decoding"
☆119Updated last year
tianjunz / HIR
☆159Updated 2 years ago
vwxyzjn / lm-human-preference-details
RLHF implementation details of OAI's 2019 codebase
☆187Updated last year
shunzh / Code-AI-Tree-Search
☆119Updated last year
mlfoundations / scaling
Language models scale reliably with over-training and on downstream tasks
☆97Updated last year
huggingface / datablations
Scaling Data-Constrained Language Models
☆338Updated last month
protagolabs / odyssey-math
☆84Updated 6 months ago
abhishekpanigrahi1996 / transformer_in_transformer
☆45Updated last year
haoliuhl / chain-of-hindsight
Simple next-token-prediction for RLHF
☆227Updated last year
mega002 / ff-layers
The accompanying code for "Transformer Feed-Forward Layers Are Key-Value Memories". Mor Geva, Roei Schuster, Jonathan Berant, and Omer Le…
☆94Updated 3 years ago