HanseulJo / position-couplingLinks
Position Coupling: Improving Length Generalization of Arithmetic Transformers Using Task Structure (NeurIPS 2024) + Arithmetic Transformers Can Length-Generalize in Both Operand Length and Count (ICLR 2025)
☆14Updated 3 months ago
Alternatives and similar repositories for position-coupling
Users that are interested in position-coupling are comparing it to the libraries listed below
Sorting:
- ☆20Updated 3 months ago
- Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)☆81Updated 2 years ago
- codes and plots for "Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs"☆10Updated last year
- ☆32Updated last year
- Code for reproducing our paper "Low Rank Adapting Models for Sparse Autoencoder Features"☆17Updated 10 months ago
- Recycling diverse models☆46Updated 3 years ago
- Code for "Training Neural Networks with Fixed Sparse Masks" (NeurIPS 2021).☆59Updated 4 years ago
- Unofficial Implementation of Selective Attention Transformer☆20Updated last year
- This is the repository for "Model Merging by Uncertainty-Based Gradient Matching", ICLR 2024.☆29Updated last year
- Gemstones: A Model Suite for Multi-Faceted Scaling Laws (NeurIPS 2025)☆32Updated 4 months ago
- ☆80Updated 3 years ago
- Bayesian Low-Rank Adaptation for Large Language Models☆36Updated last year
- Source code of "Task arithmetic in the tangent space: Improved editing of pre-trained models".☆110Updated 2 years ago
- Code Repository for the NeurIPS 2022 paper: "Hyper-Representations as Generative Models: Sampling Unseen Neural Network Weights".☆17Updated last year
- A modern look at the relationship between sharpness and generalization [ICML 2023]☆43Updated 2 years ago
- ☆73Updated last year
- Latest Weight Averaging (NeurIPS HITY 2022)☆32Updated 2 years ago
- This repository contains the code used for the experiments in the paper "Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity…☆30Updated 3 months ago
- Lightweight Adapting for Black-Box Large Language Models☆25Updated last year
- Self-Supervised Alignment with Mutual Information☆20Updated last year
- ☆34Updated 2 years ago
- Applies ROME and MEMIT on Mamba-S4 models☆14Updated last year
- [ICML2024 Spotlight] Fine-Tuning Pre-trained Large Language Models Sparsely☆24Updated last year
- Official repository for our paper, Transformers Learn Higher-Order Optimization Methods for In-Context Learning: A Study with Linear Mode…☆19Updated last year
- SLTrain: a sparse plus low-rank approach for parameter and memory efficient pretraining (NeurIPS 2024)☆39Updated last year
- ☆19Updated 10 months ago
- Code repo for the model organisms and convergent directions of EM papers.☆48Updated 4 months ago
- Official code repo for paper "Great Memory, Shallow Reasoning: Limits of kNN-LMs"☆23Updated 9 months ago
- ☆37Updated last year
- ☆13Updated 7 months ago