vvvm23 / mezo-jaxLinks
JAX implementation of "Fine-Tuning Language Models with Just Forward Passes"
☆19Updated 2 years ago
Alternatives and similar repositories for mezo-jax
Users that are interested in mezo-jax are comparing it to the libraries listed below
Sorting:
- Code for the paper "Function-Space Learning Rates"☆20Updated 3 weeks ago
- ☆32Updated 8 months ago
- Meta-learning inductive biases in the form of useful conserved quantities.☆37Updated 2 years ago
- ☆34Updated 9 months ago
- Official code for the paper "Compositional Generalization from First Principles" (NeurIPS 2023)☆11Updated last year
- A simple hypernetwork implementation in jax using haiku.☆23Updated 2 years ago
- Official code for the paper: "Metadata Archaeology"☆19Updated 2 years ago
- ☆11Updated last year
- Repository for the PopulAtion Parameter Averaging (PAPA) paper☆26Updated last year
- ☆60Updated 3 years ago
- ☆32Updated last year
- [Oral; Neurips OPT2024 ] μLO: Compute-Efficient Meta-Generalization of Learned Optimizers☆13Updated 3 months ago
- Fine-grained, dynamic control of neural network topology in JAX.☆21Updated last year
- Efficient Scaling laws and collaborative pretraining.☆16Updated 4 months ago
- Codes accompanying the paper "LaProp: a Better Way to Combine Momentum with Adaptive Gradient"☆29Updated 4 years ago
- ☆29Updated 2 years ago
- Latest Weight Averaging (NeurIPS HITY 2022)☆30Updated 2 years ago
- ☆18Updated last year
- ☆18Updated 2 years ago
- ☆53Updated 8 months ago
- ☆33Updated 2 years ago
- The 2D discrete wavelet transform for JAX☆43Updated 2 years ago
- DiCE: The Infinitely Differentiable Monte-Carlo Estimator☆31Updated last year
- PyTorch implementation for "Long Horizon Temperature Scaling", ICML 2023☆20Updated 2 years ago
- ModelDiff: A Framework for Comparing Learning Algorithms☆57Updated last year
- ICML 2022: Learning Iterative Reasoning through Energy Minimization☆46Updated 2 years ago
- Official repository for the paper "Can You Learn an Algorithm? Generalizing from Easy to Hard Problems with Recurrent Networks"☆59Updated 3 years ago
- Convolutions and more as einsum for PyTorch☆16Updated last year
- Official code for the paper "Attention as a Hypernetwork"☆39Updated last year
- Triton Implementation of HyperAttention Algorithm☆48Updated last year