rbalestr-lab / stable-pretrainingLinks
☆24Updated last week
Alternatives and similar repositories for stable-pretraining
Users that are interested in stable-pretraining are comparing it to the libraries listed below
Sorting:
- ☆27Updated last year
- [ICLR'25] Artificial Kuramoto Oscillatory Neurons☆99Updated 3 weeks ago
- ☆115Updated 2 months ago
- ☆150Updated last year
- Modern Fixed Point Systems using Pytorch☆103Updated last year
- Code and weights for the paper "Cluster and Predict Latents Patches for Improved Masked Image Modeling"☆116Updated 4 months ago
- Implementation of Diffusion Transformer (DiT) in JAX☆292Updated last year
- Code for the paper: Rotating Features for Object Discovery☆53Updated last year
- ☆275Updated last year
- Latent Program Network (from the "Searching Latent Program Spaces" paper)☆93Updated 5 months ago
- Deep Networks Grok All the Time and Here is Why☆37Updated last year
- A convenient way to trigger synchronizations to wandb / Weights & Biases if your compute nodes don't have internet!☆83Updated 3 weeks ago
- React + Next.js template for research websites (for PhD students, researchers, etc)☆187Updated 7 months ago
- CIFAR-10 speedruns: 94% in 2.6 seconds and 96% in 27 seconds☆288Updated last month
- ☆52Updated last year
- Flax (Jax) implementation of DeepSeek-R1-Distill-Qwen-1.5B with weights ported from Hugging Face.☆22Updated 6 months ago
- ViT Prisma is a mechanistic interpretability library for Vision and Video Transformers (ViTs).☆301Updated last month
- NF-Layers for constructing neural functionals.☆88Updated last year
- $100K or 100 Days: Trade-offs when Pre-Training with Academic Resources☆143Updated 3 months ago
- ☆69Updated last year
- Scalable and Stable Parallelization of Nonlinear RNNS☆20Updated last week
- 📄Small Batch Size Training for Language Models☆57Updated last week
- WIP☆94Updated last year
- Minimal GPT (~350 lines with a simple task to test it)☆62Updated 8 months ago
- The official repository for HyperZ⋅Z⋅W Operator Connects Slow-Fast Networks for Full Context Interaction.☆39Updated 4 months ago
- Flow-matching algorithms in JAX☆104Updated last year
- 🧱 Modula software package☆231Updated 2 weeks ago
- σ-GPT: A New Approach to Autoregressive Models☆67Updated last year
- A simple library for scaling up JAX programs☆143Updated 10 months ago
- The boundary of neural network trainability is fractal☆215Updated last year