ESHyperscale / HyperscaleESLinks
Jax Codebase for Evolutionary Strategies at the Hyperscale
β216Updated last month
Alternatives and similar repositories for HyperscaleES
Users that are interested in HyperscaleES are comparing it to the libraries listed below
Sorting:
- Getting crystal-like representations with harmonic lossβ195Updated 9 months ago
- πSmall Batch Size Training for Language Modelsβ80Updated 3 months ago
- β109Updated 6 months ago
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"β103Updated last year
- β53Updated 2 years ago
- H-Net Dynamic Hierarchical Architectureβ81Updated 4 months ago
- β62Updated last year
- β82Updated last year
- Landing repository for the paper "Softpick: No Attention Sink, No Massive Activations with Rectified Softmax"β86Updated 4 months ago
- β214Updated 3 weeks ago
- Attention Kernels for Symmetric Power Transformersβ128Updated 4 months ago
- A State-Space Model with Rational Transfer Function Representation.β83Updated last year
- β162Updated 3 months ago
- This repo contains the source code for the paper "Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning"β288Updated 2 months ago
- β167Updated 5 months ago
- Latent Program Network (from the "Searching Latent Program Spaces" paper)β107Updated 2 months ago
- Supporting code for the blog post on modular manifolds.β115Updated 4 months ago
- supporting pytorch FSDP for optimizersβ84Updated last year
- β55Updated last year
- A MAD laboratory to improve AI architecture designs π§ͺβ135Updated last year
- DeMo: Decoupled Momentum Optimizationβ198Updated last year
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmindβ134Updated 3 months ago
- An implementation of PSGD Kron second-order optimizer for PyTorchβ98Updated 6 months ago
- Scalable and Stable Parallelization of Nonlinear RNNSβ28Updated 3 months ago
- β123Updated 7 months ago
- Official JAX implementation of xLSTM including fast and efficient training and inference code. 7B model available at https://huggingface.β¦β105Updated last year
- Unofficial but Efficient Implementation of "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" in JAXβ92Updated 2 years ago
- β70Updated last year
- WIPβ93Updated last year
- β91Updated last year