AI-Hypercomputer / cloud-tpu-monitoring-debuggingLinks
☆13Updated 6 months ago
Alternatives and similar repositories for cloud-tpu-monitoring-debugging
Users that are interested in cloud-tpu-monitoring-debugging are comparing it to the libraries listed below
Sorting:
- Pax is a Jax-based machine learning framework for training large scale models. Pax allows for advanced and fully configurable experimenta…☆535Updated 2 weeks ago
- xpk (Accelerated Processing Kit, pronounced x-p-k,) is a software tool to help Cloud developers to orchestrate training jobs on accelerat…☆141Updated this week
- ☆188Updated 2 weeks ago
- JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs wel…☆375Updated 3 months ago
- ☆45Updated 3 weeks ago
- Orbax provides common checkpointing and persistence utilities for JAX users☆424Updated this week
- ☆146Updated last month
- Modular, scalable library to train ML models☆164Updated this week
- Legible, Scalable, Reproducible Foundation Models with Named Tensors and Jax☆658Updated this week
- JAX-Toolbox☆335Updated this week
- Library for reading and processing ML training data.☆535Updated this week
- Accelerate, Optimize performance with streamlined training and serving options with JAX.☆310Updated this week
- ☆261Updated this week
- ☆330Updated this week
- jax-triton contains integrations between JAX and OpenAI Triton☆419Updated 2 weeks ago
- ☆279Updated last year
- ☆534Updated last year
- JAX Synergistic Memory Inspector☆179Updated last year
- Train very large language models in Jax.☆208Updated last year
- A Jax-based library for building transformers, includes implementations of GPT, Gemma, LlaMa, Mixtral, Whisper, SWin, ViT and more.☆291Updated last year
- ☆16Updated 6 months ago
- seqax = sequence modeling + JAX☆167Updated last month
- CLU lets you write beautiful training loops in JAX.☆355Updated 2 months ago
- Minimal yet performant LLM examples in pure JAX☆158Updated this week
- Recipes for reproducing training and serving benchmarks for large machine learning models using GPUs on Google Cloud.☆84Updated this week
- MLCommons Algorithmic Efficiency is a benchmark and competition measuring neural network training speedups due to algorithmic improvement…☆396Updated this week
- Cost aware hyperparameter tuning algorithm☆169Updated last year
- Named Tensors for Legible Deep Learning in JAX☆203Updated this week
- A user-friendly tool chain that enables the seamless execution of ONNX models using JAX as the backend.☆123Updated last month
- Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)☆401Updated 2 weeks ago