huggingface / optimum-graphcore
Blazing fast training of π€ Transformers on Graphcore IPUs
β82Updated 8 months ago
Related projects β
Alternatives and complementary repositories for optimum-graphcore
- Inference code for LLaMA models in JAXβ113Updated 6 months ago
- β64Updated 2 years ago
- Implementation of a Transformer, but completely in Tritonβ248Updated 2 years ago
- β57Updated 2 years ago
- β177Updated last week
- JAX implementation of the Llama 2 modelβ210Updated 9 months ago
- JAX-Toolboxβ249Updated this week
- Implementation of Flash Attention in Jaxβ197Updated 8 months ago
- Easy and lightning fast training of π€ Transformers on Habana Gaudi processor (HPU)β153Updated this week
- Train very large language models in Jax.β195Updated last year
- Applied AI experiments and examples for PyTorchβ168Updated 3 weeks ago
- Google TPU optimizations for transformers modelsβ75Updated this week
- β237Updated 3 months ago
- β77Updated 5 months ago
- A library for unit scaling in PyTorchβ105Updated 2 weeks ago
- π Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flashβ¦β194Updated this week
- This repository contains the experimental PyTorch native float8 training UXβ212Updated 3 months ago
- Official code for "Distributed Deep Learning in Open Collaborations" (NeurIPS 2021)β116Updated 2 years ago
- β268Updated this week
- β334Updated 7 months ago
- Experiment of using Tangent to autodiff tritonβ72Updated 10 months ago
- Various transformers for FSDP researchβ33Updated 2 years ago
- Large scale 4D parallelism pre-training for π€ transformers in Mixture of Experts *(still work in progress)*β80Updated 11 months ago
- A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mindβ¦β146Updated this week
- jax-triton contains integrations between JAX and OpenAI Tritonβ344Updated this week
- β101Updated last month
- some common Huggingface transformers in maximal update parametrization (Β΅P)β77Updated 2 years ago
- Fast Matrix Multiplications for Lookup Table-Quantized LLMsβ187Updated this week
- [WIP] A π₯ interface for running code in the cloudβ86Updated last year
- Training material for IPU users: tutorials, feature examples, simple applicationsβ87Updated last year