Bond1995 / Markov
Code for experiments on transformers using Markovian data.
☆14Updated 5 months ago
Alternatives and similar repositories for Markov
Users that are interested in Markov are comparing it to the libraries listed below
Sorting:
- Universal Neurons in GPT2 Language Models☆29Updated 11 months ago
- ☆19Updated 10 months ago
- Official code for "Accelerating Feedforward Computation via Parallel Nonlinear Equation Solving", ICML 2021☆27Updated 3 years ago
- Official code for the paper "Attention as a Hypernetwork"☆33Updated 10 months ago
- ☆13Updated 2 years ago
- Efficient Scaling laws and collaborative pretraining.☆16Updated 3 months ago
- Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]☆66Updated 7 months ago
- Code for "Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining"☆15Updated last month
- ☆14Updated last year
- Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"☆27Updated last year
- Deep Networks Grok All the Time and Here is Why☆34Updated last year
- ☆52Updated 11 months ago
- ☆22Updated 3 months ago
- ☆32Updated 7 months ago
- ☆31Updated 6 months ago
- Unofficial Implementation of Selective Attention Transformer☆16Updated 6 months ago
- ☆31Updated 4 months ago
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"☆73Updated 6 months ago
- Code for the paper "Function-Space Learning Rates"☆20Updated last month
- Code for the paper "Data Feedback Loops: Model-driven Amplification of Dataset Biases"☆16Updated 2 years ago
- PyTorch implementation for "Long Horizon Temperature Scaling", ICML 2023☆20Updated last year
- ☆18Updated last month
- Code for GFlowNet-EM, a novel algorithm for fitting latent variable models with compositional latents and an intractable true posterior.☆40Updated last year
- A modern look at the relationship between sharpness and generalization [ICML 2023]☆43Updated last year
- Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…☆26Updated 11 months ago
- Code for "Theoretical Foundations of Deep Selective State-Space Models" (NeurIPS 2024)☆12Updated 4 months ago
- ☆26Updated last year
- Code for reproducing our paper "Low Rank Adapting Models for Sparse Autoencoder Features"☆10Updated last month
- ☆24Updated 3 months ago
- ☆31Updated last year