yandex-research / btard
Code for the paper "Secure Distributed Training at Scale" (ICML 2022)
☆14Updated 2 years ago
Related projects ⓘ
Alternatives and complementary repositories for btard
- "Towards Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts" (NeurIPS 2020), original PyTorch implemen…☆53Updated 4 years ago
- Compression schema for gradients of activations in backward pass☆44Updated last year
- Towards Understanding Sharpness-Aware Minimization [ICML 2022]☆35Updated 2 years ago
- Memory-efficient transformer. Work in progress.☆19Updated 2 years ago
- Training vision models with full-batch gradient descent and regularization☆38Updated last year
- Code release for REPAIR: REnormalizing Permuted Activations for Interpolation Repair☆45Updated 9 months ago
- Python library for argument and configuration management☆53Updated last year
- [ICML 2021] "Efficient Lottery Ticket Finding: Less Data is More" by Zhenyu Zhang*, Xuxi Chen*, Tianlong Chen*, Zhangyang Wang☆25Updated 2 years ago
- Spartan is an algorithm for training sparse neural network models. This repository accompanies the paper "Spartan Differentiable Sparsity…☆24Updated 2 years ago
- Lightweight torch implementation of rigl, a sparse-to-sparse optimizer.☆55Updated 2 years ago
- Code for "Training Neural Networks with Fixed Sparse Masks" (NeurIPS 2021).☆56Updated 2 years ago
- ☆16Updated 5 months ago
- Latest Weight Averaging (NeurIPS HITY 2022)☆20Updated last year
- The implementation for MLSys 2023 paper: "Cuttlefish: Low-rank Model Training without All The Tuning"☆43Updated last year
- "Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices", official implementation☆28Updated last year
- Code accompanying the NeurIPS 2020 paper: WoodFisher (Singh & Alistarh, 2020)☆46Updated 3 years ago
- Practical low-rank gradient compression for distributed optimization: https://arxiv.org/abs/1905.13727☆143Updated last week
- Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)☆79Updated last year
- Code for Sanity-Checking Pruning Methods: Random Tickets can Win the Jackpot☆43Updated 4 years ago
- [ICLR 2023] Eva: Practical Second-order Optimization with Kronecker-vectorized Approximation☆11Updated last year
- A centralized place for deep thinking code and experiments☆76Updated last year
- ☆33Updated last year
- [ICLR 2023] "Sparsity May Cry: Let Us Fail (Current) Sparse Neural Networks Together!" Shiwei Liu, Tianlong Chen, Zhenyu Zhang, Xuxi Chen…☆27Updated last year
- Source code of "What can linearized neural networks actually say about generalization?☆18Updated 3 years ago
- ☆36Updated 2 years ago
- Code for the paper: Why Transformers Need Adam: A Hessian Perspective☆40Updated 6 months ago
- Git Re-Basin: Merging Models modulo Permutation Symmetries in PyTorch☆71Updated last year
- Official code for "Distributed Deep Learning in Open Collaborations" (NeurIPS 2021)☆116Updated 2 years ago
- [NeurIPS‘2021] "MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the Edge", Geng Yuan, Xiaolong Ma, Yanzhi Wang et al…☆18Updated 2 years ago
- Code for "Picking Winning Tickets Before Training by Preserving Gradient Flow" https://openreview.net/pdf?id=SkgsACVKPH☆100Updated 4 years ago