AllanYangZhou / universal_neural_functionalLinks
☆51Updated last year
Alternatives and similar repositories for universal_neural_functional
Users that are interested in universal_neural_functional are comparing it to the libraries listed below
Sorting:
- Scalable and Stable Parallelization of Nonlinear RNNS☆19Updated 6 months ago
- Latent Program Network (from the "Searching Latent Program Spaces" paper)☆93Updated 5 months ago
- A centralized place for deep thinking code and experiments☆86Updated 2 years ago
- Sparse and discrete interpretability tool for neural networks☆63Updated last year
- ☆56Updated 10 months ago
- Deep Networks Grok All the Time and Here is Why☆37Updated last year
- The Energy Transformer block, in JAX☆59Updated last year
- This repository includes code to reproduce the tables in "Loss Landscapes are All You Need: Neural Network Generalization Can Be Explaine…☆37Updated 2 years ago
- NF-Layers for constructing neural functionals.☆88Updated last year
- ☆115Updated 2 months ago
- Implementation of PSGD optimizer in JAX☆34Updated 7 months ago
- Universal Neurons in GPT2 Language Models☆30Updated last year
- ☆53Updated last year
- 📄Small Batch Size Training for Language Models☆43Updated this week
- A simple library for scaling up JAX programs☆143Updated 9 months ago
- ☆31Updated 9 months ago
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"☆81Updated 9 months ago
- Official PyTorch Implementation of the Longhorn Deep State Space Model☆54Updated 8 months ago
- Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAX☆87Updated last year
- ☆68Updated 2 years ago
- ☆34Updated last year
- ☆233Updated 6 months ago
- ☆40Updated 3 years ago
- ICML 2022: Learning Iterative Reasoning through Energy Minimization☆47Updated 2 years ago
- ☆30Updated 5 months ago
- 🧱 Modula software package☆225Updated last week
- Notebooks accompanying Anthropic's "Toy Models of Superposition" paper☆129Updated 2 years ago
- Code for "Can We Scale Transformers to Predict Parameters of Diverse ImageNet Models?" [ICML 2023]☆36Updated 11 months ago
- ☆69Updated last year
- ☆177Updated last week