AI-Hypercomputer / torchprimeLinks

torchprime is a reference model implementation for PyTorch on TPU.

☆41

Alternatives and similar repositories for torchprime

Users that are interested in torchprime are comparing it to the libraries listed below

Sorting:

mgmalek / efficient_cross_entropy
☆121Updated last year
yixiaoer / tpux
A set of Python scripts that makes your experience on TPU better
☆54Updated 2 months ago
huggingface / kernels
Load compute kernels from the Hub
☆337Updated last week
google / praxis
☆190Updated 3 weeks ago
AI-Hypercomputer / ray-tpu
☆15Updated 6 months ago
nshepperd / flash_attn_jax
JAX bindings for Flash Attention v2
☆99Updated last month
fattorib / ZeRO-transformer
Two implementations of ZeRO-1 optimizer sharding in JAX
☆14Updated 2 years ago
proger / accelerated-scan
Accelerated First Order Parallel Associative Scan
☆192Updated last year
graphcore-research / unit-scaling
A library for unit scaling in PyTorch
☆132Updated 4 months ago
foundation-model-stack / fms-fsdp
🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…
☆271Updated last week
AI-Hypercomputer / jetstream-pytorch
PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"
☆78Updated 2 months ago
jax-ml / jax-llm-examples
Minimal yet performant LLM examples in pure JAX
☆204Updated 2 months ago
facebookresearch / spdl
Scalable and Performant Data Loading
☆349Updated this week
AI-Hypercomputer / kithara
☆16Updated 6 months ago
ethansmith2000 / fsdp_optimizers
supporting pytorch FSDP for optimizers
☆84Updated last year
EleutherAI / nanoGPT-mup
The simplest, fastest repository for training/finetuning medium-sized GPTs.
☆174Updated 5 months ago
ayaka14732 / llama-2-jax
JAX implementation of the Llama 2 model
☆216Updated last year
meta-pytorch / float8_experimental
This repository contains the experimental PyTorch native float8 training UX
☆226Updated last year
google-deepmind / nanodo
☆285Updated last year
HazyResearch / based
Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"
☆243Updated 6 months ago
thecharlieblake / lovely-llama
An implementation of the Llama architecture, to instruct and delight
☆21Updated 6 months ago
erfanzar / EasyDeL
Accelerate, Optimize performance with streamlined training and serving options with JAX.
☆325Updated this week
warner-benjamin / optimi
Fast, Modern, and Low Precision PyTorch Optimizers
☆116Updated 3 months ago
epfml / dynamic-sparse-flash-attention
☆150Updated 2 years ago
young-geng / mlxu
Machine Learning eXperiment Utilities
☆46Updated 4 months ago
AI-Hypercomputer / cloud-accelerator-diagnostics
☆24Updated 2 weeks ago
young-geng / mintext
Minimal but scalable implementation of large language models in JAX
☆35Updated last week
cloneofsimo / min-fsdp
☆91Updated last year
lucidrains / ring-attention-pytorch
Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch
☆548Updated 6 months ago
JonasGeiping / linear_cross_entropy_loss
A fusion of a linear layer and a cross entropy loss, written for pytorch in triton.
☆73Updated last year