ESHyperscale / HyperscaleESLinks
Jax Codebase for Evolutionary Strategies at the Hyperscale
☆40Updated last week
Alternatives and similar repositories for HyperscaleES
Users that are interested in HyperscaleES are comparing it to the libraries listed below
Sorting:
- ☆201Updated 3 months ago
- This repo contains the source code for the paper "Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning"☆261Updated this week
- H-Net Dynamic Hierarchical Architecture☆80Updated 2 months ago
- ☆105Updated 4 months ago
- Implementation of SOAR☆43Updated 2 months ago
- Attention Kernels for Symmetric Power Transformers☆128Updated 2 months ago
- A MAD laboratory to improve AI architecture designs 🧪☆135Updated 11 months ago
- Latent Program Network (from the "Searching Latent Program Spaces" paper)☆106Updated 2 months ago
- Storing long contexts in tiny caches with self-study☆217Updated last month
- Landing repository for the paper "Softpick: No Attention Sink, No Massive Activations with Rectified Softmax"☆85Updated 2 months ago
- ☆157Updated 3 months ago
- ☆82Updated last year
- Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)☆194Updated last year
- $100K or 100 Days: Trade-offs when Pre-Training with Academic Resources☆147Updated last month
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆130Updated 11 months ago
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆103Updated 11 months ago
- ☆143Updated 2 months ago
- ☆53Updated last year
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆174Updated 5 months ago
- EvaByte: Efficient Byte-level Language Models at Scale☆111Updated 7 months ago
- ☆128Updated 11 months ago
- ☆28Updated last year
- DeMo: Decoupled Momentum Optimization☆197Updated 11 months ago
- σ-GPT: A New Approach to Autoregressive Models☆70Updated last year
- Getting crystal-like representations with harmonic loss☆192Updated 7 months ago
- some common Huggingface transformers in maximal update parametrization (µP)☆87Updated 3 years ago
- 📄Small Batch Size Training for Language Models☆64Updated last month
- The Automated LLM Speedrunning Benchmark measures how well LLM agents can reproduce previous innovations and discover new ones in languag…☆112Updated last month
- 🧱 Modula software package☆307Updated 3 months ago
- gzip Predicts Data-dependent Scaling Laws☆34Updated last year