Supporting code for the blog post on modular manifolds.
☆121Sep 26, 2025Updated 7 months ago
Alternatives and similar repositories for manifolds
Users that are interested in manifolds are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Code for implementing central flows☆44Sep 5, 2025Updated 8 months ago
- Code for the paper "Function-Space Learning Rates"☆24Jun 3, 2025Updated 11 months ago
- Code for "What really matters in matrix-whitening optimizers?"☆24Oct 31, 2025Updated 6 months ago
- ☆19Dec 4, 2025Updated 5 months ago
- ☆33Oct 4, 2024Updated last year
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Schedule free optimiser implemented in JAX using Optimistix☆15May 29, 2024Updated last year
- ☆37Feb 26, 2024Updated 2 years ago
- Parallel Associative Scan for Language Models☆18Jan 8, 2024Updated 2 years ago
- ☆124May 28, 2024Updated last year
- Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)☆25Jun 6, 2024Updated last year
- 🧱 Modula software package☆329Aug 18, 2025Updated 9 months ago
- FlexAttention w/ FlashAttention3 Support☆27Oct 5, 2024Updated last year
- train with kittens!☆66Oct 25, 2024Updated last year
- diffusers with search engine☆12Jan 13, 2026Updated 4 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- [Poster; ICLR 2026] [Oral; Neurips OPT2024] μLO: Compute-Efficient Meta-Generalization of Learned Optimizers☆16Apr 15, 2026Updated last month
- ☆69Mar 21, 2025Updated last year
- ☆26Feb 20, 2026Updated 3 months ago
- [ACL 2025] Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models☆39Nov 4, 2025Updated 6 months ago
- Accelerated First Order Parallel Associative Scan☆197Jan 7, 2026Updated 4 months ago
- About Code release for "FlashBias: Fast Computation of Attention with Bias" (NeurIPS 2025), https://arxiv.org/abs/2505.12044☆28Nov 17, 2025Updated 6 months ago
- Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"☆113Oct 11, 2025Updated 7 months ago
- Train toy models using multi-token prediction objective☆14Apr 18, 2026Updated last month
- Grokking on modular arithmetic in less than 150 epochs in MLX☆15Oct 24, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Experiment of using Tangent to autodiff triton☆82Jan 22, 2024Updated 2 years ago
- ☆20May 30, 2024Updated last year
- A library for unit scaling in PyTorch☆133Jul 11, 2025Updated 10 months ago
- Unofficial implementation of paper : Exploring the Space of Key-Value-Query Models with Intention☆12May 24, 2023Updated 2 years ago
- Odysseus: Playground of LLM Sequence Parallelism☆78Jun 17, 2024Updated last year
- ☆22Dec 15, 2023Updated 2 years ago
- Aggregating embeddings over time☆32Jan 19, 2023Updated 3 years ago
- Triton-based implementation of Sparse Mixture of Experts.☆273Oct 3, 2025Updated 7 months ago
- Combining SOAP and MUON☆21Feb 11, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆45Nov 1, 2025Updated 6 months ago
- ☆13Jun 3, 2024Updated last year
- ☆15Dec 5, 2019Updated 6 years ago
- ☆114Aug 26, 2024Updated last year
- Code for the paper "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"☆18Mar 15, 2024Updated 2 years ago
- ☆11Oct 11, 2023Updated 2 years ago
- ☆24Oct 15, 2024Updated last year