AI-Hypercomputer / torchprimeLinks
torchprime is a reference model implementation for PyTorch on TPU.
☆44Updated last week
Alternatives and similar repositories for torchprime
Users that are interested in torchprime are comparing it to the libraries listed below
Sorting:
- PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"☆79Updated last month
- Load compute kernels from the Hub☆359Updated last week
- ☆342Updated last week
- 🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…☆278Updated last month
- This repository contains the experimental PyTorch native float8 training UX☆227Updated last year
- ☆124Updated last year
- ☆192Updated this week
- ☆552Updated last year
- Fast low-bit matmul kernels in Triton☆423Updated last month
- Minimal yet performant LLM examples in pure JAX☆226Updated 2 weeks ago
- Scalable and Performant Data Loading☆362Updated this week
- Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)☆469Updated this week
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆594Updated 5 months ago
- ring-attention experiments☆161Updated last year
- Triton-based implementation of Sparse Mixture of Experts.☆260Updated 3 months ago
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"☆244Updated 7 months ago
- ☆92Updated last year
- Two implementations of ZeRO-1 optimizer sharding in JAX☆14Updated 2 years ago
- A library for unit scaling in PyTorch☆133Updated 6 months ago
- MoE training for Me and You and maybe other people☆319Updated 2 weeks ago
- Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch☆549Updated 8 months ago
- FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.☆331Updated 2 months ago
- Google TPU optimizations for transformers models☆132Updated 3 weeks ago
- A set of Python scripts that makes your experience on TPU better☆55Updated 4 months ago
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"☆249Updated 11 months ago
- Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jax☆690Updated this week
- ☆151Updated last week
- Pax is a Jax-based machine learning framework for training large scale models. Pax allows for advanced and fully configurable experimenta…☆544Updated this week
- TPU inference for vLLM, with unified JAX and PyTorch support.☆213Updated this week
- Accelerated First Order Parallel Associative Scan☆193Updated last week