AI-Hypercomputer / torchprime
torchprime is a reference model implementation for PyTorch on TPU.
☆19Updated this week
Alternatives and similar repositories for torchprime
Users that are interested in torchprime are comparing it to the libraries listed below
Sorting:
- PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"☆60Updated last month
- ☆13Updated last month
- ☆106Updated 11 months ago
- Google TPU optimizations for transformers models☆109Updated 3 months ago
- Two implementations of ZeRO-1 optimizer sharding in JAX☆14Updated last year
- ☆20Updated last year
- Pytorch/XLA SPMD Test code in Google TPU☆23Updated last year
- 🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…☆245Updated this week
- ☆186Updated 2 weeks ago
- Load compute kernels from the Hub☆119Updated last week
- Various transformers for FSDP research☆37Updated 2 years ago
- ☆138Updated 2 weeks ago
- A fork of the PEFT library, supporting Robust Adaptation (RoSA)☆14Updated 9 months ago
- This repository contains the experimental PyTorch native float8 training UX☆224Updated 9 months ago
- Code for Zero-Shot Tokenizer Transfer☆127Updated 4 months ago
- A library for unit scaling in PyTorch☆125Updated 5 months ago
- Fast, Modern, Memory Efficient, and Low Precision PyTorch Optimizers☆92Updated 10 months ago
- Applied AI experiments and examples for PyTorch☆265Updated 2 weeks ago
- 🤗 Transformers: State-of-the-art Natural Language Processing for TensorFlow 2.0 and PyTorch.☆17Updated last month
- Inference code for LLaMA models in JAX☆118Updated 11 months ago
- JAX implementation of the Llama 2 model☆218Updated last year
- A fusion of a linear layer and a cross entropy loss, written for pytorch in triton.☆67Updated 9 months ago
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆195Updated this week
- Fast low-bit matmul kernels in Triton☆299Updated this week
- ☆79Updated 10 months ago
- ☆204Updated 3 weeks ago
- Collection of autoregressive model implementation☆85Updated 3 weeks ago
- Multipack distributed sampler for fast padding-free training of LLMs☆189Updated 9 months ago
- A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS☆173Updated last week
- Cataloging released Triton kernels.☆221Updated 4 months ago