PanZaifeng / G-SLIDE
☆14Updated 3 years ago
Alternatives and similar repositories for G-SLIDE:
Users that are interested in G-SLIDE are comparing it to the libraries listed below
- ☆15Updated 2 years ago
- A Learnable LSH Framework for Efficient NN Training☆31Updated 3 years ago
- Memory Optimizations for Deep Learning (ICML 2023)☆62Updated 11 months ago
- DL Dataloader Benchmarks☆18Updated 3 weeks ago
- ☆14Updated 2 years ago
- Distributed DataLoader For Pytorch Based On Ray☆24Updated 3 years ago
- Distributed ML Optimizer☆30Updated 3 years ago
- Accelerating Recommender model training by leveraging popular choices -- VLDB 2022☆30Updated 5 months ago
- [ICLR 2022] Code for paper "Exploring Extreme Parameter Compression for Pre-trained Language Models"(https://arxiv.org/abs/2205.10036)☆22Updated last year
- ☆22Updated 4 years ago
- ☆73Updated 3 years ago
- PipeTransformer: Automated Elastic Pipelining for Distributed Training of Large-scale Models. ICML 2021☆55Updated 3 years ago
- Set of datasets for the deep learning recommendation model (DLRM).☆41Updated 2 years ago
- Graphiler is a compiler stack built on top of DGL and TorchScript which compiles GNNs defined using user-defined functions (UDFs) into ef…☆61Updated 2 years ago
- Fast sparse deep learning on CPUs☆52Updated 2 years ago
- A study of the downstream instability of word embeddings☆12Updated 2 years ago
- [KDD 2022] AutoShard: Automated Embedding Table Sharding for Recommender Systems☆21Updated last year
- Inference framework for MoE layers based on TensorRT with Python binding☆41Updated 3 years ago
- Light-weight GPU kernel interface for graph operations☆15Updated 4 years ago
- ☆38Updated last year
- Code for ICML2019 paper: Learning to Route in Similarity Graphs☆58Updated 6 months ago
- Code for paper 'Minimizing FLOPs to Learn Efficient Sparse Representations' published at ICLR 2020☆20Updated 5 years ago
- This is the (evolving) reading list for the seminar.☆57Updated 4 years ago
- [ICDCS 2023] DeAR: Accelerating Distributed Deep Learning with Fine-Grained All-Reduce Pipelining☆12Updated last year
- a minimal cache manager for PagedAttention, on top of llama3.☆67Updated 5 months ago
- "Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices", official implementation☆29Updated 2 weeks ago
- A "gym" style toolkit for building lightweight NAS systems.☆13Updated 2 years ago
- PyTorch-Direct code on top of PyTorch-1.8.0nightly (e152ca5) for Large Graph Convolutional Network Training with GPU-Oriented Data Commun…☆45Updated last year
- Code for paper: Towards Similarity Graphs Constructed by Deep Reinforcement Learning☆21Updated 5 years ago