aryol / inductive-scratchpadLinks

Implementation for our paper "How Far Can Transformers Reason? The Locality Barrier and Inductive Scratchpad"

☆11

Alternatives and similar repositories for inductive-scratchpad

Users that are interested in inductive-scratchpad are comparing it to the libraries listed below

Sorting:

sjunhongshen / DASH
☆23Updated 2 years ago
JeanKaddour / LAWA
Latest Weight Averaging (NeurIPS HITY 2022)
☆31Updated 2 years ago
AndyShih12 / LongHorizonTemperatureScaling
PyTorch implementation for "Long Horizon Temperature Scaling", ICML 2023
☆20Updated 2 years ago
MadryLab / modeldiff
ModelDiff: A Framework for Comparing Learning Algorithms
☆59Updated last year
GFNOrg / GFlowNet-EM
Code for GFlowNet-EM, a novel algorithm for fitting latent variable models with compositional latents and an intractable true posterior.
☆41Updated last year
YannDubs / Invariant-Self-Supervised-Learning
Pytorch code for "Improving Self-Supervised Learning by Characterizing Idealized Representations"
☆41Updated 2 years ago
Ping-C / optimizer
This repository includes code to reproduce the tables in "Loss Landscapes are All You Need: Neural Network Generalization Can Be Explaine…
☆37Updated 2 years ago
SamsungSAILMontreal / ghn3
Code for "Can We Scale Transformers to Predict Parameters of Diverse ImageNet Models?" [ICML 2023]
☆36Updated 10 months ago
ayulockin / LossLandscape
Explores the ideas presented in Deep Ensembles: A Loss Landscape Perspective (https://arxiv.org/abs/1912.02757) by Stanislav Fort, Huiyi …
☆65Updated 4 years ago
facebookresearch / ModelRatatouille
Recycling diverse models
☆45Updated 2 years ago
JonasGeiping / dataaugs
☆18Updated 2 years ago
aks2203 / deep-thinking
A centralized place for deep thinking code and experiments
☆85Updated last year
lxuechen / ml-swissknife
An ML research codebase built with friends :)
☆24Updated 10 months ago
stanislavfort / dissect-git-re-basin
Replicating and dissecting the git-re-basin project in one-click-replication Colabs
☆36Updated 2 years ago
shikaiqiu / compute-better-spent
☆53Updated 9 months ago
google-deepmind / ssl_hsic
☆37Updated 11 months ago
yilundu / improved_contrastive_divergence
[ICML'21] Improved Contrastive Divergence Training of Energy Based Models
☆63Updated 3 years ago
nick11roberts / XD
☆12Updated 3 years ago
KellerJordan / REPAIR
Code release for REPAIR: REnormalizing Permuted Activations for Interpolation Repair
☆48Updated last year
alexrame / diwa
DiWA: Diverse Weight Averaging for Out-of-Distribution Generalization
☆31Updated 2 years ago
gregorbachmann / scaling_mlps
☆51Updated last year
JeanKaddour / NoTrainNoGain
Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)
☆80Updated last year
sjunhongshen / ORCA
Official implementation of ORCA proposed in the paper "Cross-Modal Fine-Tuning: Align then Refine"
☆71Updated last year
tml-epfl / understanding-sam
Towards Understanding Sharpness-Aware Minimization [ICML 2022]
☆35Updated 3 years ago
GFNOrg / EB_GFN
Code for our paper "Generative Flow Networks for Discrete Probabilistic Modeling"
☆84Updated 2 years ago
aniruddhraghu / meta-pretraining
Code accompanying paper: Meta-Learning to Improve Pre-Training
☆37Updated 3 years ago
ml-jku / EVA
One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation
☆40Updated 9 months ago
epfml / schedules-and-scaling
Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"
☆77Updated 8 months ago
GFNOrg / GFN_vs_HVI
☆9Updated 2 years ago
xu-ji / information-bottleneck
Deep Learning & Information Bottleneck
☆61Updated 2 years ago