thinking-machines-lab / manifoldsLinks
Supporting code for the blog post on modular manifolds.
β101Updated last month
Alternatives and similar repositories for manifolds
Users that are interested in manifolds are comparing it to the libraries listed below
Sorting:
- supporting pytorch FSDP for optimizersβ83Updated 11 months ago
- πSmall Batch Size Training for Language Modelsβ63Updated last month
- β41Updated last week
- The simplest, fastest repository for training/finetuning medium-sized GPTs.β171Updated 4 months ago
- β91Updated last year
- WIPβ93Updated last year
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT trainingβ132Updated last year
- Accelerated First Order Parallel Associative Scanβ189Updated last year
- Normalized Transformer (nGPT)β192Updated 11 months ago
- DeMo: Decoupled Momentum Optimizationβ196Updated 11 months ago
- β30Updated 11 months ago
- research impl of Native Sparse Attention (2502.11089)β63Updated 8 months ago
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"β84Updated last year
- Code for the paper "Function-Space Learning Rates"β23Updated 5 months ago
- β68Updated 11 months ago
- β47Updated 3 weeks ago
- Minimal but scalable implementation of large language models in JAXβ35Updated 2 months ago
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmindβ131Updated last week
- H-Net Dynamic Hierarchical Architectureβ80Updated 2 months ago
- Tiny re-implementation of MDM in style of LLaDA and nano-gpt speedrunβ57Updated 8 months ago
- β61Updated last year
- JAX bindings for Flash Attention v2β97Updated last week
- β53Updated last year
- β119Updated 5 months ago
- β34Updated last year
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clustersβ130Updated 11 months ago
- Focused on fast experimentation and simplicityβ75Updated 10 months ago
- β53Updated last year
- Muon fsdp 2β44Updated 3 months ago
- FlashRNN - Fast RNN Kernels with I/O Awarenessβ144Updated 3 weeks ago