AI-Hypercomputer / ray-tpuLinks
☆15Updated 4 months ago
Alternatives and similar repositories for ray-tpu
Users that are interested in ray-tpu are comparing it to the libraries listed below
Sorting:
- torchprime is a reference model implementation for PyTorch on TPU.☆39Updated last week
- ☆122Updated last year
- Experimenting with how best to do multi-host dataloading☆10Updated 2 years ago
- ☆14Updated 4 months ago
- Experimental playground for benchmarking language model (LM) architectures, layers, and tricks on smaller datasets. Designed for flexible…☆81Updated last month
- ☆20Updated 2 years ago
- Two implementations of ZeRO-1 optimizer sharding in JAX☆14Updated 2 years ago
- Fast, Modern, and Low Precision PyTorch Optimizers☆113Updated last month
- Blazing fast data loading with HuggingFace Dataset and Ray Data☆16Updated last year
- Various transformers for FSDP research☆38Updated 2 years ago
- Machine Learning eXperiment Utilities☆47Updated 2 months ago
- ☆62Updated 3 years ago
- ☆21Updated 7 months ago
- A place to store reusable transformer components of my own creation or found on the interwebs☆60Updated 2 weeks ago
- ☆15Updated last year
- A library for unit scaling in PyTorch☆130Updated 2 months ago
- MiSS is a novel PEFT method that features a low-rank structure but introduces a new update mechanism distinct from LoRA, achieving an exc…☆23Updated last month
- Randomized Positional Encodings Boost Length Generalization of Transformers☆82Updated last year
- Implementation of a Light Recurrent Unit in Pytorch☆49Updated last year
- A fast implementation of T5/UL2 in PyTorch using Flash Attention☆107Updated 6 months ago
- An implementation of the Llama architecture, to instruct and delight☆21Updated 4 months ago
- DPO, but faster 🚀☆44Updated 10 months ago
- some common Huggingface transformers in maximal update parametrization (µP)☆82Updated 3 years ago
- Griffin MQA + Hawk Linear RNN Hybrid☆89Updated last year
- (EasyDel Former) is a utility library designed to simplify and enhance the development in JAX☆28Updated last week
- ☆14Updated last year
- Tiled Flash Linear Attention library for fast and efficient mLSTM Kernels.☆70Updated last month
- Triton Implementation of HyperAttention Algorithm☆48Updated last year
- A byte-level decoder architecture that matches the performance of tokenized Transformers.☆66Updated last year
- Implementation of the proposed Adam-atan2 from Google Deepmind in Pytorch☆125Updated 10 months ago