huggingface / optimum-tpuLinks
Google TPU optimizations for transformers models
☆123Updated 10 months ago
Alternatives and similar repositories for optimum-tpu
Users that are interested in optimum-tpu are comparing it to the libraries listed below
Sorting:
- ☆136Updated last year
- Load compute kernels from the Hub☆337Updated last week
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆202Updated last year
- ☆138Updated 3 months ago
- Manage scalable open LLM inference endpoints in Slurm clusters☆277Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMs☆267Updated last year
- 👷 Build compute kernels☆190Updated this week
- Lightweight toolkit package to train and fine-tune 1.58bit Language models☆99Updated 6 months ago
- Collection of autoregressive model implementation☆86Updated 7 months ago
- PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"