AI-Hypercomputer / torchprimeLinks
torchprime is a reference model implementation for PyTorch on TPU.
β39Updated last week
Alternatives and similar repositories for torchprime
Users that are interested in torchprime are comparing it to the libraries listed below
Sorting:
- β122Updated last year
- π Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flashβ¦β268Updated 2 months ago
- A library for unit scaling in PyTorchβ130Updated 2 months ago
- β15Updated 4 months ago
- Scalable and Performant Data Loadingβ304Updated 2 weeks ago
- Two implementations of ZeRO-1 optimizer sharding in JAXβ14Updated 2 years ago
- Load compute kernels from the Hubβ293Updated this week
- β189Updated last week
- Accelerated First Order Parallel Associative Scanβ190Updated last year
- Implementation of π Ring Attention, from Liu et al. at Berkeley AI, in Pytorchβ540Updated 4 months ago
- PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"β73Updated 3 weeks ago
- β91Updated last year
- The simplest, fastest repository for training/finetuning medium-sized GPTs.β164Updated 3 months ago
- This repository contains the experimental PyTorch native float8 training UXβ224Updated last year
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"β240Updated 4 months ago
- A set of Python scripts that makes your experience on TPU betterβ54Updated 2 weeks ago
- β173Updated last year
- Triton-based implementation of Sparse Mixture of Experts.β242Updated this week
- Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jaxβ667Updated last week
- Machine Learning eXperiment Utilitiesβ47Updated 2 months ago
- JAX implementation of the Llama 2 modelβ218Updated last year
- seqax = sequence modeling + JAXβ167Updated 2 months ago
- Efficient optimizersβ265Updated this week
- Understand and test language model architectures on synthetic tasks.β229Updated last week
- Minimal yet performant LLM examples in pure JAXβ181Updated 2 weeks ago
- Fast, Modern, and Low Precision PyTorch Optimizersβ113Updated last month
- β149Updated 2 years ago
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT trainingβ132Updated last year
- β331Updated 3 weeks ago
- supporting pytorch FSDP for optimizersβ84Updated 10 months ago