IntelLabs / SLIDE_opt_iaLinks
β74Updated last year
Alternatives and similar repositories for SLIDE_opt_ia
Users that are interested in SLIDE_opt_ia are comparing it to the libraries listed below
Sorting:
- benchmarking some transformer deploymentsβ26Updated 2 years ago
- Nod.ai π¦ version of π» . You probably want to start at https://github.com/nod-ai/shark for the product and the upstream IREE repository β¦β106Updated 7 months ago
- Python Research Frameworkβ106Updated 2 years ago
- A GPT, made only of MLPs, in Jaxβ58Updated 4 years ago
- Swarm training framework using Haiku + JAX + Ray for layer parallel transformer language models on unreliable, heterogeneous nodesβ241Updated 2 years ago
- PyTorch interface for the IPUβ180Updated last year
- A collection of optimizers, some arcane others well known, for Flax.β29Updated 4 years ago
- π Pytorch code for the Nero optimiser.β20Updated 2 years ago
- Memory Efficient Attention (O(sqrt(n)) for Jax and PyTorchβ184Updated 2 years ago
- "Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices", official implementationβ29Updated 6 months ago
- β39Updated 2 years ago
- Implementation of the specific Transformer architecture from PaLM - Scaling Language Modeling with Pathways - in Jax (Equinox framework)β187Updated 3 years ago
- Official code for "Distributed Deep Learning in Open Collaborations" (NeurIPS 2021)β117Updated 3 years ago
- [NeurIPS 2022] DataMUX: Data Multiplexing for Neural Networksβ60Updated 2 years ago
- Customized matrix multiplication kernelsβ56Updated 3 years ago
- PyTorch implementation of L2L execution algorithmβ108Updated 2 years ago
- A thin, highly portable toolkit for efficiently compiling dense loop-based computation.β148Updated 2 years ago
- Torch Distributed Experimentalβ117Updated last year
- Lightweight machine learning library based on OpenCL 1.2β75Updated 4 years ago
- SLIDE (Sub-LInear Deep learning Engine) written in Goβ45Updated 5 years ago
- a lightweight transformer library for PyTorchβ72Updated 3 years ago
- GPU implementation of a fast generalized ANS (asymmetric numeral system) entropy encoder and decoder, with extensions for lossless compreβ¦β347Updated 2 months ago
- An implementation of Additive Attentionβ149Updated 3 years ago
- Alpha Zero equipped with Transformer with various novel techniques for speedup in tree searchβ27Updated 6 years ago
- DiffQ performs differentiable quantization using pseudo quantization noise. It can automatically tune the number of bits used per weight β¦β236Updated 2 years ago
- A tracing JIT compiler for PyTorchβ13Updated 3 years ago
- HetSeq: Distributed GPU Training on Heterogeneous Infrastructureβ106Updated 2 years ago
- Utilities for sequential processing of tar files.β24Updated 3 years ago
- torch::deploy (multipy for non-torch uses) is a system that lets you get around the GIL problem by running multiple Python interpreters iβ¦β180Updated this week
- Implementation of a Tensorflow XLA rematerialization passβ15Updated 5 years ago