IntelLabs / SLIDE_opt_ia
☆74Updated last year
Alternatives and similar repositories for SLIDE_opt_ia:
Users that are interested in SLIDE_opt_ia are comparing it to the libraries listed below
- benchmarking some transformer deployments☆26Updated 2 years ago
- Swarm training framework using Haiku + JAX + Ray for layer parallel transformer language models on unreliable, heterogeneous nodes☆237Updated last year
- Nod.ai 🦈 version of 👻 . You probably want to start at https://github.com/nod-ai/shark for the product and the upstream IREE repository …☆106Updated 2 months ago
- A collection of optimizers, some arcane others well known, for Flax.☆29Updated 3 years ago
- Customized matrix multiplication kernels☆54Updated 3 years ago
- Torch Distributed Experimental☆115Updated 7 months ago
- Stride visualizations☆37Updated 6 years ago
- Productionize machine learning predictions, with ONNX or without☆65Updated last year
- A Learnable LSH Framework for Efficient NN Training☆31Updated 3 years ago
- ☆39Updated 2 years ago
- Development repository for integrating FlexFlow (A distributed deep learning framework that supports flexible parallelization strategies)…☆28Updated 3 years ago
- PyProf2: PyTorch Profiling tool☆82Updated 4 years ago
- 👑 Pytorch code for the Nero optimiser.☆20Updated 2 years ago
- A tensor-aware point-to-point communication primitive for machine learning☆256Updated 2 years ago
- Unifying Python/C++/CUDA memory: Python buffered array ↔️ `std::vector` ↔️ CUDA managed memory☆80Updated 3 weeks ago
- The Foundation for All Legate Libraries☆207Updated this week
- a lightweight transformer library for PyTorch☆71Updated 3 years ago
- PyTorch implementation of L2L execution algorithm☆107Updated 2 years ago
- ☆68Updated last year
- An Aspiring Drop-In Replacement for Pandas at Scale☆75Updated 3 years ago
- Fairring (FAIR + Herring) is a plug-in for PyTorch that provides a process group for distributed training that outperforms NCCL at large …☆65Updated 3 years ago
- HetSeq: Distributed GPU Training on Heterogeneous Infrastructure☆106Updated last year
- ☆13Updated 3 years ago
- torch::deploy (multipy for non-torch uses) is a system that lets you get around the GIL problem by running multiple Python interpreters i…☆178Updated 3 months ago
- [JMLR'20] NeurIPS 2019 MicroNet Challenge Efficient Language Modeling, Champion☆40Updated 4 years ago
- ☆471Updated 3 years ago
- "Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices", official implementation☆29Updated last month
- SLIDE (Sub-LInear Deep learning Engine) written in Go☆44Updated 4 years ago
- This is a Tensor Train based compression library to compress sparse embedding tables used in large-scale machine learning models such as …☆193Updated 2 years ago
- ☆21Updated 2 years ago