ESHyperscale / HyperscaleESLinks
Jax Codebase for Evolutionary Strategies at the Hyperscale
☆188Updated last month
Alternatives and similar repositories for HyperscaleES
Users that are interested in HyperscaleES are comparing it to the libraries listed below
Sorting:
- ☆105Updated 4 months ago
- Getting crystal-like representations with harmonic loss☆193Updated 8 months ago
- ☆208Updated 4 months ago
- 🧱 Modula software package☆316Updated 4 months ago
- A MAD laboratory to improve AI architecture designs 🧪☆135Updated last year
- 📄Small Batch Size Training for Language Models☆68Updated 2 months ago
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆103Updated 11 months ago
- supporting pytorch FSDP for optimizers☆84Updated last year
- ☆62Updated last year
- An implementation of PSGD Kron second-order optimizer for PyTorch☆97Updated 4 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆180Updated 5 months ago
- Official JAX implementation of xLSTM including fast and efficient training and inference code. 7B model available at https://huggingface.…☆105Updated 11 months ago
- ☆53Updated last year
- Latent Program Network (from the "Searching Latent Program Spaces" paper)☆106Updated 3 weeks ago
- DeMo: Decoupled Momentum Optimization☆197Updated last year
- ☆82Updated last year
- ☆162Updated 4 months ago
- Repository for code used in the xVal paper☆146Updated last year
- σ-GPT: A New Approach to Autoregressive Models☆70Updated last year
- WIP☆93Updated last year
- H-Net Dynamic Hierarchical Architecture☆80Updated 3 months ago
- Landing repository for the paper "Softpick: No Attention Sink, No Massive Activations with Rectified Softmax"☆85Updated 3 months ago
- ☆229Updated last year
- Attention Kernels for Symmetric Power Transformers☆128Updated 2 months ago
- This repo contains the source code for the paper "Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning"☆277Updated 3 weeks ago
- MoE training for Me and You and maybe other people☆239Updated this week
- The AdEMAMix Optimizer: Better, Faster, Older.☆186Updated last year
- ☆56Updated last year
- ☆28Updated last year
- CIFAR-10 speedruns: 94% in 2.6 seconds and 96% in 27 seconds☆335Updated last month