formll / resolving-scaling-law-discrepanciesLinks

☆20

Alternatives and similar repositories for resolving-scaling-law-discrepancies

Users that are interested in resolving-scaling-law-discrepancies are comparing it to the libraries listed below

Sorting:

mcleish7 / gemstone-scaling-laws
Gemstones: A Model Suite for Multi-Faceted Scaling Laws (NeurIPS 2025)
☆29Updated 3 weeks ago
katiekang1998 / reasoning_generalization
☆33Updated 9 months ago
IBM / ColPret
Efficient Scaling laws and collaborative pretraining.
☆18Updated last month
janphilippfranken / sami
Self-Supervised Alignment with Mutual Information
☆21Updated last year
tml-epfl / icl-alignment
Is In-Context Learning Sufficient for Instruction Following in LLMs? [ICLR 2025]
☆31Updated 9 months ago
matchten / LoRA-Models-for-SAEs
Code for reproducing our paper "Low Rank Adapting Models for Sparse Autoencoder Features"
☆17Updated 6 months ago
epfml / schedules-and-scaling
Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"
☆84Updated 11 months ago
yidingjiang / ado
The repository contains code for Adaptive Data Optimization
☆26Updated 10 months ago
Hritikbansal / jpo
☆13Updated 3 months ago
sail-sg / dice
Official implementation of Bootstrapping Language Models via DPO Implicit Rewards
☆44Updated 6 months ago
john-hewitt / implicit-ins
Codebase for Instruction Following without Instruction Tuning
☆36Updated last year
sail-sg / SkyLadder
The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling
☆35Updated last week
dangxingyu / rnn-icrag
Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"
☆27Updated last year
ablghtianyi / ICL_Modular_Arithmetic
☆19Updated 7 months ago
abhishekpanigrahi1996 / transformer_in_transformer
☆45Updated 2 years ago
gregorbachmann / Next-Token-Failures
☆103Updated last year
ahans30 / goldfish-loss
[NeurIPS 2024] Goldfish Loss: Mitigating Memorization in Generative LLMs
☆92Updated 11 months ago
sjelassi / transformers_ssm_copy
☆33Updated last year
hamishivi / automated-instruction-selection
Exploration of automated dataset selection approaches at large scales.
☆47Updated 7 months ago
nathanhu0 / CaMeLS
Codebase for Context-aware Meta-learned Loss Scaling (CaMeLS). https://arxiv.org/abs/2305.15076.
☆25Updated last year
ctlllll / reward_collapse
☆27Updated 2 years ago
mlfoundations / scaling
Language models scale reliably with over-training and on downstream tasks
☆100Updated last year
MadryLab / DsDm
☆50Updated last year
yale-nlp / refdpo
☆16Updated last year
r-three / RAD
Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model
☆43Updated 3 weeks ago
liziniu / GEM
Code for Paper (Preserving Diversity in Supervised Fine-tuning of Large Language Models)
☆40Updated 5 months ago
allenai / easy-to-hard-generalization
Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"
☆48Updated last year
googleinterns / localizing-paragraph-memorization
☆15Updated last year
lingo-mit / lm-truthfulness
☆17Updated last year
princeton-nlp / Edge-Pruning
[NeurIPS 2024 Spotlight] Code and data for the paper "Finding Transformer Circuits with Edge Pruning".
☆61Updated 2 months ago