Position Coupling: Improving Length Generalization of Arithmetic Transformers Using Task Structure (NeurIPS 2024) + Arithmetic Transformers Can Length-Generalize in Both Operand Length and Count (ICLR 2025)
☆14Oct 26, 2025Updated 4 months ago
Alternatives and similar repositories for position-coupling
Users that are interested in position-coupling are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆13Dec 13, 2024Updated last year
- ☆11Jan 2, 2026Updated 2 months ago
- MSIT AI Fair(MAF)☆39Jan 8, 2026Updated 2 months ago
- ☆38Jan 8, 2026Updated 2 months ago
- AI Development in Evolving Policy [AI DEP]☆46Jul 7, 2025Updated 8 months ago
- Stick-breaking attention☆63Jul 1, 2025Updated 8 months ago
- Student materials for Stats/Datasci 507, Fall 2021.☆10Dec 10, 2021Updated 4 years ago
- ☆17Oct 31, 2023Updated 2 years ago
- ☆37Dec 12, 2023Updated 2 years ago
- The is the official implementation of "Lyra: Orchestrating Dual Correction in Automated Theorem Proving"☆15Jul 2, 2024Updated last year
- ☆13Jun 26, 2024Updated last year
- Code repository of AI-Endo☆16Jan 16, 2024Updated 2 years ago
- Code for the paper "Decomposing the Enigma: Subgoal-based Demonstration Learning for Formal Theorem Proving"☆19May 25, 2023Updated 2 years ago
- ☆20Oct 25, 2022Updated 3 years ago
- ☆12Oct 28, 2022Updated 3 years ago
- Code for the paper "PLASTIC: Improving Input and Label Plasticity for Sample Efficient Reinforcement Learning" (NeurIPS 2023)☆22Dec 8, 2023Updated 2 years ago
- T5 Fine-tuning on SQuAD Dataset for Question Generation☆12Feb 16, 2023Updated 3 years ago
- A case study approach to successful data science projects using Python pandas and scikit learn☆10Jun 27, 2019Updated 6 years ago
- PyTorch implementation for "Gradient Surgery for Multi-Task Learning" https://arxiv.org/abs/2001.06782☆13Jul 6, 2020Updated 5 years ago
- Data Science Case Studies☆18Jan 31, 2021Updated 5 years ago
- This is the official implementation of our ICML 2024 paper "MultiMax: Sparse and Multi-Modal Attention Learning""☆22Feb 9, 2026Updated last month
- An optimization-based algorithm to accurately estimate the causal effects and robustly predict under distribution shifts. It leverages th…☆14Jul 10, 2024Updated last year
- ☆14Jul 6, 2021Updated 4 years ago
- Neural network sequence labeling model☆11Dec 28, 2019Updated 6 years ago
- Code for Implicit Regularization in Deep Matrix Factorization.☆40Jul 25, 2024Updated last year
- ☆32Mar 24, 2023Updated 2 years ago
- Tiny Tutorial on https://arxiv.org/abs/1703.04730☆14Nov 19, 2019Updated 6 years ago
- This is a poor-mans framework to automate the creation of a CTFd instance, dynamically recreating challenges and the interface.☆10Mar 23, 2020Updated 6 years ago
- ☆84Aug 31, 2023Updated 2 years ago
- BibLint -- a system for fixing bibtex databases☆21Feb 2, 2026Updated last month
- ☆17Mar 22, 2021Updated 5 years ago
- Inverse Scaling in Test-Time Compute☆25Dec 3, 2025Updated 3 months ago
- Codebase for a Marimba playing robot☆15Nov 6, 2024Updated last year
- To mitigate position bias in LLMs, especially in long-context scenarios, we scale only one dimension of LLMs, reducing position bias and …☆11Jun 18, 2024Updated last year
- ☆16Feb 2, 2022Updated 4 years ago
- ☆21Jun 1, 2025Updated 9 months ago
- Official implementation of the transformer (TF) architecture suggested in a paper entitled "Looped Transformers as Programmable Computers…☆37Apr 8, 2023Updated 2 years ago
- Achieve your marketing goals with the data analytics power of Python☆13Aug 21, 2019Updated 6 years ago
- [ICLR 2022] "Bayesian Modeling and Uncertainty Quantification for Learning to Optimize: What, Why, and How" by Yuning You, Yue Cao, Tianl…☆14Aug 19, 2022Updated 3 years ago