Bond1995 / Markov
Code for experiments on transformers using Markovian data.
☆11Updated 4 months ago
Alternatives and similar repositories for Markov:
Users that are interested in Markov are comparing it to the libraries listed below
- Universal Neurons in GPT2 Language Models☆27Updated 10 months ago
- Deep Networks Grok All the Time and Here is Why☆33Updated 10 months ago
- Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers☆17Updated last month
- Efficient Scaling laws and collaborative pretraining.☆16Updated 2 months ago
- ☆15Updated last year
- ☆18Updated 8 months ago
- ☆31Updated 11 months ago
- Source code for the paper "Positional Attention: Out-of-Distribution Generalization and Expressivity for Neural Algorithmic Reasoning"☆14Updated 2 months ago
- This repo is built to facilitate the training and analysis of autoregressive transformers on maze-solving tasks.☆27Updated 7 months ago
- Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]☆63Updated 6 months ago
- ☆31Updated 10 months ago
- ☆30Updated 5 months ago
- Official code for the paper "Attention as a Hypernetwork"☆25Updated 9 months ago
- ☆13Updated 2 years ago
- PyTorch implementation for "Long Horizon Temperature Scaling", ICML 2023☆20Updated last year
- Official Code Repository for the paper "Key-value memory in the brain"☆24Updated last month
- ☆26Updated last year
- Lottery Ticket Adaptation☆39Updated 4 months ago
- Official code for "Accelerating Feedforward Computation via Parallel Nonlinear Equation Solving", ICML 2021☆27Updated 3 years ago
- ☆25Updated this week
- ☆31Updated 2 months ago
- JAX implementation of "Fine-Tuning Language Models with Just Forward Passes"☆19Updated last year
- HGRN2: Gated Linear RNNs with State Expansion☆53Updated 7 months ago
- ☆21Updated 2 months ago
- Official repository for the paper "Neural Differential Equations for Learning to Program Neural Nets Through Continuous Learning Rules" (…☆21Updated 2 years ago
- This is the official repository for the "Towards Vision-Language Mechanistic Interpretability: A Causal Tracing Tool for BLIP" paper acce…☆21Updated 11 months ago
- ☆30Updated 4 months ago
- A simple hypernetwork implementation in jax using haiku.☆23Updated 2 years ago
- Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"☆26Updated 11 months ago
- Remasking Discrete Diffusion Models with Inference-Time Scaling☆16Updated 3 weeks ago