facebookresearch / FBTT-EmbeddingLinks
This is a Tensor Train based compression library to compress sparse embedding tables used in large-scale machine learning models such as recommendation and natural language processing. We showed this library can reduce the total model size by up to 100x in Facebook’s open sourced DLRM model while achieving same model quality. Our implementation …
☆194Updated 2 years ago
Alternatives and similar repositories for FBTT-Embedding
Users that are interested in FBTT-Embedding are comparing it to the libraries listed below
Sorting:
- Research and development for optimizing transformers☆129Updated 4 years ago
- Simple Distributed Deep Learning on TensorFlow☆133Updated last month
- A tensor-aware point-to-point communication primitive for machine learning☆259Updated 2 years ago
- Training neural networks in TensorFlow 2.0 with 5x less memory☆132Updated 3 years ago
- This repository contains the results and code for the MLPerf™ Training v0.7 benchmark.☆57Updated 2 years ago
- PyTorch RFCs (experimental)☆133Updated last month
- PyTorch implementation of L2L execution algorithm☆107Updated 2 years ago
- High performance distributed framework for training deep learning recommendation models based on PyTorch.☆409Updated last month
- Slicing a PyTorch Tensor Into Parallel Shards☆299Updated last month
- Block-sparse primitives for PyTorch☆157Updated 4 years ago
- Fast sparse deep learning on CPUs☆53Updated 2 years ago
- ☆251Updated 11 months ago
- A library for syntactically rewriting Python programs, pronounced (sinner).☆69Updated 3 years ago
- http://vlsiarch.eecs.harvard.edu/research/recommendation/☆136Updated 2 years ago
- DLPack for Tensorflow☆35Updated 5 years ago
- Training material for IPU users: tutorials, feature examples, simple applications☆86Updated 2 years ago
- distributed-embeddings is a library for building large embedding based models in Tensorflow 2.☆44Updated last year
- A GPU performance profiling tool for PyTorch models☆503Updated 4 years ago
- A Tool for Automatic Parallelization of Deep Learning Training in Distributed Multi-GPU Environments.☆132Updated 3 years ago
- Distributed preprocessing and data loading for language datasets☆39Updated last year
- Implementation of a Transformer, but completely in Triton☆270Updated 3 years ago
- PyTorch interface for the IPU☆180Updated last year
- Stride visualizations☆37Updated 7 years ago
- Fast Block Sparse Matrices for Pytorch☆548Updated 4 years ago
- Running BERT without Padding☆472Updated 3 years ago
- PyTorch implementation of Hash Embeddings (NIPS 2017). Submission to the NIPS Implementation Challenge.☆199Updated 6 years ago
- Torch Distributed Experimental☆116Updated 11 months ago
- ☆55Updated last year
- FTPipe and related pipeline model parallelism research.☆41Updated 2 years ago
- MONeT framework for reducing memory consumption of DNN training☆173Updated 4 years ago