HanseulJo / position-couplingLinks

Position Coupling: Improving Length Generalization of Arithmetic Transformers Using Task Structure (NeurIPS 2024) + Arithmetic Transformers Can Length-Generalize in Both Operand Length and Count (ICLR 2025)

☆11

Alternatives and similar repositories for position-coupling

Users that are interested in position-coupling are comparing it to the libraries listed below

Sorting:

formll / resolving-scaling-law-discrepancies
☆20Updated this week
r-three / mats
☆31Updated last year
JeanKaddour / NoTrainNoGain
Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)
☆80Updated 2 years ago
facebookresearch / ModelRatatouille
Recycling diverse models
☆46Updated 2 years ago
VITA-Group / Junk_DNA_Hypothesis
[ICML 2024] Junk DNA Hypothesis: A Task-Centric Angle of LLM Pre-trained Weights through Sparsity; Lu Yin*, Ajay Jaiswal*, Shiwei Liu, So…
☆16Updated 6 months ago
mmatena / model_merging
☆78Updated 3 years ago
locuslab / edge-of-stability
☆71Updated 11 months ago
Hritikbansal / jpo
☆13Updated 4 months ago
MadryLab / datamodels-data
Data for "Datamodels: Predicting Predictions with Training Data"
☆97Updated 2 years ago
socialfoundations / tttlm
Test-time-training on nearest neighbors for large language models
☆46Updated last year
JeanKaddour / LAWA
Latest Weight Averaging (NeurIPS HITY 2022)
☆31Updated 2 years ago
DeqingFu / transformers-icl-second-order
Official repository for our paper, Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study with Linear Mode…
☆19Updated 11 months ago
tml-epfl / sharpness-vs-generalization
A modern look at the relationship between sharpness and generalization [ICML 2023]
☆43Updated 2 years ago
varunnair18 / FISH
Code for "Training Neural Networks with Fixed Sparse Masks" (NeurIPS 2021).
☆59Updated 3 years ago
epfml / schedules-and-scaling
Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"
☆84Updated last year
gortizji / tangent_task_arithmetic
Source code of "Task arithmetic in the tangent space: Improved editing of pre-trained models".
☆105Updated 2 years ago
IBM / ColPret
Efficient Scaling laws and collaborative pretraining.
☆18Updated last month
mcleish7 / gemstone-scaling-laws
Gemstones: A Model Suite for Multi-Faceted Scaling Laws (NeurIPS 2025)
☆29Updated last month
HSG-AIML / NeurIPS_2022-Generative_Hyper_Representations
Code Repository for the NeurIPS 2022 paper: "Hyper-Representations as Generative Models: Sampling Unseen Neural Network Weights".
☆17Updated last year
haotiansun14 / BBox-Adapter
Lightweight Adapting for Black-Box Large Language Models
☆24Updated last year
tml-epfl / icl-alignment
Is In-Context Learning Sufficient for Instruction Following in LLMs? [ICLR 2025]
☆31Updated 9 months ago
KempnerInstitute / llm_uncertainty
Code for the paper "Distinguishing the Knowable from the Unknowable with Language Models"
☆10Updated last year
janphilippfranken / sami
Self-Supervised Alignment with Mutual Information
☆21Updated last year
abhishekpanigrahi1996 / transformer_in_transformer
☆45Updated 2 years ago
matchten / LoRA-Models-for-SAEs
Code for reproducing our paper "Low Rank Adapting Models for Sparse Autoencoder Features"
☆17Updated 7 months ago
Qualcomm-AI-research / llm-surgeon
☆34Updated last year
sjelassi / transformers_ssm_copy
☆35Updated last year
kamanphoebe / Look-into-MoEs
[NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models
☆55Updated 9 months ago
ablghtianyi / ICL_Modular_Arithmetic
☆19Updated 7 months ago
erosenfeld / disagree_discrep
Provably (and non-vacuously) bounding test error of deep neural networks under distribution shift with unlabeled test data.
☆10Updated last year