AI-Hypercomputer / ray-tpuLinks
☆15Updated 7 months ago
Alternatives and similar repositories for ray-tpu
Users that are interested in ray-tpu are comparing it to the libraries listed below
Sorting:
- ☆16Updated 7 months ago
- torchprime is a reference model implementation for PyTorch on TPU.☆43Updated 2 months ago
- ☆121Updated last year
- (EasyDel Former) is a utility library designed to simplify and enhance the development in JAX☆29Updated 3 weeks ago
- ☆21Updated last year
- Two implementations of ZeRO-1 optimizer sharding in JAX☆14Updated 2 years ago
- ☆20Updated 2 years ago
- Fast, Modern, and Low Precision PyTorch Optimizers☆118Updated 3 months ago
- some common Huggingface transformers in maximal update parametrization (µP)☆87Updated 3 years ago
- Experimental playground for benchmarking language model (LM) architectures, layers, and tricks on smaller datasets. Designed for flexible…☆88Updated last month
- Blazing fast data loading with HuggingFace Dataset and Ray Data☆16Updated last year
- DPO, but faster 🚀☆46Updated last year
- Transformer with Mu-Parameterization, implemented in Jax/Flax. Supports FSDP on TPU pods.☆32Updated 6 months ago
- LayerNorm(SmallInit(Embedding)) in a Transformer to improve convergence☆61Updated 3 years ago
- Various transformers for FSDP research☆38Updated 3 years ago
- Randomized Positional Encodings Boost Length Generalization of Transformers☆83Updated last year
- ☆16Updated last year
- ☆190Updated this week
- A toolkit for scaling law research ⚖☆53Updated 10 months ago
- ☆21Updated 9 months ago
- A place to store reusable transformer components of my own creation or found on the interwebs☆63Updated last week
- PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"☆79Updated 3 months ago
- MiSS is a novel PEFT method that features a low-rank structure but introduces a new update mechanism distinct from LoRA, achieving an exc…☆25Updated last month
- An implementation of the Llama architecture, to instruct and delight☆21Updated 6 months ago
- ☆148Updated last month
- A fast implementation of T5/UL2 in PyTorch using Flash Attention☆112Updated last month
- ☆62Updated 3 years ago
- Tiled Flash Linear Attention library for fast and efficient mLSTM Kernels.☆79Updated 3 weeks ago
- If it quacks like a tensor...☆59Updated last year
- research impl of Native Sparse Attention (2502.11089)☆63Updated 10 months ago