misko / human_descentLinks
☆37Updated last month
Alternatives and similar repositories for human_descent
Users that are interested in human_descent are comparing it to the libraries listed below
Sorting:
- Getting crystal-like representations with harmonic loss☆195Updated 10 months ago
- Latent Program Network (from the "Searching Latent Program Spaces" paper)☆107Updated 2 months ago
- 🧱 Modula software package☆322Updated 5 months ago
- Implementation of Diffusion Transformer (DiT) in JAX☆306Updated last year
- ☆291Updated last year
- The boundary of neural network trainability is fractal☆221Updated last year
- ☆21Updated 10 months ago
- Brain-Inspired Modular Training (BIMT), a method for making neural networks more modular and interpretable.☆175Updated 2 years ago
- The history files when recording human interaction while solving ARC tasks☆117Updated 2 weeks ago
- $100K or 100 Days: Trade-offs when Pre-Training with Academic Resources☆150Updated 4 months ago
- Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)☆198Updated last year
- An implementation of PSGD Kron second-order optimizer for PyTorch☆98Updated 6 months ago
- Minimal yet performant LLM examples in pure JAX☆240Updated 3 weeks ago
- ☆215Updated last month
- Graph neural networks in JAX.☆68Updated last year
- CIFAR-10 speedruns: 94% in 2.6 seconds and 96% in 27 seconds☆352Updated 2 months ago
- Efficient optimizers☆281Updated last month
- Flow-matching algorithms in JAX☆115Updated last year
- Bare-bones implementations of some generative models in Jax: diffusion, normalizing flows, consistency models, flow matching, (beta)-VAEs…☆141Updated 2 years ago
- Automatic gradient descent☆217Updated 2 years ago
- WIP☆93Updated last year
- Dion optimizer algorithm☆431Updated 3 weeks ago
- σ-GPT: A New Approach to Autoregressive Models☆70Updated last year
- Simple Transformer in Jax☆142Updated last year
- ☆246Updated last year
- Jax Codebase for Evolutionary Strategies at the Hyperscale☆218Updated last month
- Code for the Fractured Entangled Representation Hypothesis position paper!☆221Updated 3 months ago
- DeMo: Decoupled Momentum Optimization☆198Updated last year
- The AdEMAMix Optimizer: Better, Faster, Older.☆186Updated last year
- ☆46Updated last week