ESHyperscale / HyperscaleESLinks
Jax Codebase for Evolutionary Strategies at the Hyperscale
☆207Updated 2 weeks ago
Alternatives and similar repositories for HyperscaleES
Users that are interested in HyperscaleES are comparing it to the libraries listed below
Sorting:
- ☆108Updated 5 months ago
- Getting crystal-like representations with harmonic loss☆194Updated 9 months ago
- ☆213Updated 4 months ago
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆103Updated last year
- This repo contains the source code for the paper "Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning"☆284Updated last month
- DeMo: Decoupled Momentum Optimization☆198Updated last year
- H-Net Dynamic Hierarchical Architecture☆80Updated 3 months ago
- 📄Small Batch Size Training for Language Models☆79Updated 3 months ago
- An implementation of PSGD Kron second-order optimizer for PyTorch☆97Updated 5 months ago
- Attention Kernels for Symmetric Power Transformers☆128Updated 3 months ago
- ☆82Updated last year
- $100K or 100 Days: Trade-offs when Pre-Training with Academic Resources☆149Updated 3 months ago
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆132Updated 2 months ago
- Latent Program Network (from the "Searching Latent Program Spaces" paper)☆107Updated last month
- Mixture of A Million Experts☆52Updated last year
- Landing repository for the paper "Softpick: No Attention Sink, No Massive Activations with Rectified Softmax"☆86Updated 3 months ago
- ☆158Updated 2 months ago
- Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)☆198Updated last year
- ☆62Updated last year
- A MAD laboratory to improve AI architecture designs 🧪☆136Updated last year
- Supporting code for the blog post on modular manifolds.☆109Updated 3 months ago
- ☆164Updated 4 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆181Updated 6 months ago
- PyTorch implementation of models from the Zamba2 series.☆186Updated 11 months ago
- ☆53Updated last year
- σ-GPT: A New Approach to Autoregressive Models☆70Updated last year
- WIP☆93Updated last year
- ☆131Updated last year
- ☆70Updated last year
- MoE training for Me and You and maybe other people☆315Updated last week