misko / human_descent
☆32Updated this week
Related projects ⓘ
Alternatives and complementary repositories for human_descent
- Graph neural networks in JAX.☆67Updated 4 months ago
- ☆20Updated last month
- Simple Transformer in Jax☆115Updated 4 months ago
- A package for defining deep learning models using categorical algebraic expressions.☆56Updated 3 months ago
- Scalable neural net training via automatic normalization in the modular norm.☆119Updated 2 months ago
- ☆89Updated this week
- Flow-matching algorithms in JAX☆74Updated 3 months ago
- The boundary of neural network trainability is fractal☆161Updated 9 months ago
- ☆40Updated 4 months ago
- Bare-bones implementations of some generative models in Jax: diffusion, normalizing flows, consistency models, flow matching, (beta)-VAEs…☆123Updated 10 months ago
- ☆58Updated 2 years ago
- Visualizations of the theory behind diffusion models.☆74Updated 6 months ago
- Resources from the EleutherAI Math Reading Group☆50Updated last month
- A MAD laboratory to improve AI architecture designs 🧪☆95Updated 6 months ago
- Implementation of Diffusion Transformer (DiT) in JAX☆252Updated 5 months ago
- WIP☆89Updated 2 months ago
- Muon optimizer for neural networks: >30% extra sample efficiency, <3% wallclock overhead☆69Updated this week
- ☆113Updated last week
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆84Updated 2 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆84Updated last week
- Brain-Inspired Modular Training (BIMT), a method for making neural networks more modular and interpretable.☆162Updated last year
- This is a port of Mistral-7B model in JAX☆30Updated 4 months ago
- ☆53Updated 9 months ago
- Your favourite classical machine learning algos on the GPU/TPU☆20Updated last month
- Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)☆177Updated 5 months ago
- 94% on CIFAR-10 in 2.59 seconds 💨 96% in 27 seconds☆168Updated this week
- Tensor Network Library with Autograd☆148Updated last week
- ☆122Updated this week
- $100K or 100 Days: Trade-offs when Pre-Training with Academic Resources☆93Updated last week
- seqax = sequence modeling + JAX☆132Updated 3 months ago