facebookresearch / FBTT-Embedding
This is a Tensor Train based compression library to compress sparse embedding tables used in large-scale machine learning models such as recommendation and natural language processing. We showed this library can reduce the total model size by up to 100x in Facebook’s open sourced DLRM model while achieving same model quality. Our implementation …
☆194Updated 2 years ago
Alternatives and similar repositories for FBTT-Embedding:
Users that are interested in FBTT-Embedding are comparing it to the libraries listed below
- Research and development for optimizing transformers☆125Updated 4 years ago
- PyTorch implementation of L2L execution algorithm☆107Updated 2 years ago
- http://vlsiarch.eecs.harvard.edu/research/recommendation/☆134Updated 2 years ago
- Simple Distributed Deep Learning on TensorFlow☆134Updated 2 years ago
- A tensor-aware point-to-point communication primitive for machine learning☆257Updated 2 years ago
- Training neural networks in TensorFlow 2.0 with 5x less memory☆130Updated 3 years ago
- FTPipe and related pipeline model parallelism research.☆41Updated last year
- High performance distributed framework for training deep learning recommendation models based on PyTorch.☆404Updated this week
- Block-sparse primitives for PyTorch☆155Updated 4 years ago
- Slicing a PyTorch Tensor Into Parallel Shards☆298Updated 3 years ago
- distributed-embeddings is a library for building large embedding based models in Tensorflow 2.☆44Updated last year
- This repository contains the results and code for the MLPerf™ Training v0.7 benchmark.☆56Updated last year
- A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind…☆156Updated 4 months ago
- A GPU performance profiling tool for PyTorch models☆505Updated 3 years ago
- Running BERT without Padding☆471Updated 3 years ago
- A library of GPU kernels for sparse matrix operations.☆262Updated 4 years ago
- Practical low-rank gradient compression for distributed optimization: https://arxiv.org/abs/1905.13727☆146Updated 5 months ago
- Set of datasets for the deep learning recommendation model (DLRM).☆45Updated 2 years ago
- Simple Training and Deployment of Fast End-to-End Binary Networks☆157Updated 3 years ago
- Implementation of a Transformer, but completely in Triton☆263Updated 3 years ago
- ☆142Updated 2 months ago
- Development repository for integrating FlexFlow (A distributed deep learning framework that supports flexible parallelization strategies)…☆28Updated 3 years ago
- ☆8Updated last year
- Implementing Google's DistBelief paper☆109Updated 2 years ago
- ☆251Updated 9 months ago
- A library for syntactically rewriting Python programs, pronounced (sinner).☆69Updated 3 years ago
- An Efficient Pipelined Data Parallel Approach for Training Large Model☆75Updated 4 years ago
- PyTorch elastic training☆730Updated 2 years ago
- Fast sparse deep learning on CPUs☆53Updated 2 years ago
- Python bindings for NVTX☆66Updated last year