huggingface / optimum-graphcore
Blazing fast training of π€ Transformers on Graphcore IPUs
β84Updated last year
Alternatives and similar repositories for optimum-graphcore:
Users that are interested in optimum-graphcore are comparing it to the libraries listed below
- β67Updated 2 years ago
- Implementation of Flash Attention in Jaxβ207Updated last year
- β344Updated 11 months ago
- Training material for IPU users: tutorials, feature examples, simple applicationsβ86Updated 2 years ago
- JAX implementation of the Llama 2 modelβ217Updated last year
- Inference code for LLaMA models in JAXβ116Updated 10 months ago
- Implementation of the specific Transformer architecture from PaLM - Scaling Language Modeling with Pathways - in Jax (Equinox framework)β187Updated 2 years ago
- β293Updated last week
- β185Updated this week
- DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.β164Updated last week
- β60Updated 3 years ago
- jax-triton contains integrations between JAX and OpenAI Tritonβ388Updated last week
- Train very large language models in Jax.β203Updated last year
- Google TPU optimizations for transformers modelsβ107Updated 2 months ago
- OSLO: Open Source for Large-scale Optimizationβ175Updated last year
- β249Updated 8 months ago
- Official code for "Distributed Deep Learning in Open Collaborations" (NeurIPS 2021)β116Updated 3 years ago
- A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mindβ¦β156Updated 4 months ago
- Implementation of a Transformer, but completely in Tritonβ263Updated 3 years ago
- Amos optimizer with JEstimator lib.β82Updated 10 months ago
- Swarm training framework using Haiku + JAX + Ray for layer parallel transformer language models on unreliable, heterogeneous nodesβ238Updated last year
- Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"β363Updated last year
- Torch Distributed Experimentalβ115Updated 8 months ago
- Easy and lightning fast training of π€ Transformers on Habana Gaudi processor (HPU)β183Updated this week
- [WIP] A π₯ interface for running code in the cloudβ86Updated 2 years ago
- This repository contains the experimental PyTorch native float8 training UXβ222Updated 8 months ago
- π Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flashβ¦β235Updated this week
- Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".β273Updated last year
- Exploring finetuning public checkpoints on filter 8K sequences on Pileβ115Updated 2 years ago
- Large scale 4D parallelism pre-training for π€ transformers in Mixture of Experts *(still work in progress)*β81Updated last year