allenbai01 / transformers-as-statisticians
☆26Updated last year
Related projects ⓘ
Alternatives and complementary repositories for transformers-as-statisticians
- Curse-of-memory phenomenon of RNNs in sequence modelling☆19Updated this week
- Distributional and Outlier Robust Optimization (ICML 2021)☆27Updated 3 years ago
- A modern look at the relationship between sharpness and generalization [ICML 2023]☆43Updated last year
- ☆13Updated 6 months ago
- Code for Accelerated Linearized Laplace Approximation for Bayesian Deep Learning (ELLA, NeurIPS 22')☆16Updated 2 years ago
- ☆18Updated last month
- Code for "Decision-Focused Learning without Differentiable Optimization: Learning Locally Optimized Decision Losses"☆23Updated 8 months ago
- ☆25Updated 3 weeks ago
- ☆20Updated 11 months ago
- Deep Learning & Information Bottleneck☆50Updated last year
- ☆25Updated 4 months ago
- Code and data for the paper "Understanding Hidden Context in Preference Learning: Consequences for RLHF"☆27Updated 11 months ago
- ☆15Updated 2 years ago
- Provably (and non-vacuously) bounding test error of deep neural networks under distribution shift with unlabeled test data.☆9Updated 8 months ago
- ☆17Updated 2 years ago
- ☆59Updated 3 years ago
- Official code for "Decoding-Time Language Model Alignment with Multiple Objectives".☆14Updated 3 weeks ago
- ☆34Updated last year
- Code for the paper "Pretraining task diversity and the emergence of non-Bayesian in-context learning for regression"☆20Updated last year
- Preprint: Asymmetry in Low-Rank Adapters of Foundation Models☆30Updated 8 months ago
- [NeurIPS 2021] A Geometric Analysis of Neural Collapse with Unconstrained Features☆53Updated 2 years ago
- ☆13Updated last year
- Official code for "Accelerating Feedforward Computation via Parallel Nonlinear Equation Solving", ICML 2021☆26Updated 3 years ago
- Towards Understanding Sharpness-Aware Minimization [ICML 2022]☆35Updated 2 years ago
- Gradient Estimation with Discrete Stein Operators (NeurIPS 2022)☆17Updated last year
- Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]☆52Updated last month
- Test-time-training on nearest neighbors for large language models☆27Updated 7 months ago
- Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"☆24Updated 7 months ago
- Official code for Generative Marginalization Models [ICML 2024] [SPGIM 2023 Workshop Oral]☆20Updated 3 months ago
- ☆33Updated 9 months ago