AI-Hypercomputer / maxdiffusion
☆197Updated this week
Alternatives and similar repositories for maxdiffusion:
Users that are interested in maxdiffusion are comparing it to the libraries listed below
- PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"☆54Updated last month
- JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs wel…☆299Updated this week
- Google TPU optimizations for transformers models☆104Updated 2 months ago
- ☆137Updated this week
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training☆122Updated 11 months ago
- Focused on fast experimentation and simplicity☆69Updated 3 months ago
- JAX implementation of the Llama 2 model☆216Updated last year
- JAX-Toolbox☆289Updated this week
- PyTorch per step fault tolerance (actively under development)☆267Updated this week
- Scalable and Performant Data Loading☆230Updated this week
- Implementation of Flash Attention in Jax☆206Updated last year
- Efficient optimizers☆184Updated 2 weeks ago
- This repository contains the experimental PyTorch native float8 training UX☆222Updated 7 months ago
- ☆76Updated 8 months ago
- ring-attention experiments☆128Updated 5 months ago
- Inference code for LLaMA models in JAX☆116Updated 10 months ago
- ☆214Updated 8 months ago
- jax-triton contains integrations between JAX and OpenAI Triton☆386Updated last week
- Pax is a Jax-based machine learning framework for training large scale models. Pax allows for advanced and fully configurable experimenta…☆484Updated last week
- ☆301Updated 9 months ago
- supporting pytorch FSDP for optimizers☆79Updated 3 months ago
- 🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…☆232Updated 2 weeks ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆100Updated 4 months ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆125Updated 3 months ago
- Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch☆506Updated 4 months ago
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"☆223Updated last month
- ☆184Updated last month
- ☆290Updated this week
- Simple implementation of muP, based on Spectral Condition for Feature Learning. The implementation is SGD only, dont use it for Adam☆73Updated 7 months ago
- ☆87Updated last week