facebookresearch / FBTT-Embedding
This is a Tensor Train based compression library to compress sparse embedding tables used in large-scale machine learning models such as recommendation and natural language processing. We showed this library can reduce the total model size by up to 100x in Facebook’s open sourced DLRM model while achieving same model quality. Our implementation …
☆193Updated 2 years ago
Alternatives and similar repositories for FBTT-Embedding:
Users that are interested in FBTT-Embedding are comparing it to the libraries listed below
- http://vlsiarch.eecs.harvard.edu/research/recommendation/☆133Updated 2 years ago
- Research and development for optimizing transformers☆125Updated 4 years ago
- Simple Distributed Deep Learning on TensorFlow☆134Updated 2 years ago
- A tensor-aware point-to-point communication primitive for machine learning☆256Updated 2 years ago
- distributed-embeddings is a library for building large embedding based models in Tensorflow 2.☆43Updated last year
- Block-sparse primitives for PyTorch☆154Updated 3 years ago
- Training neural networks in TensorFlow 2.0 with 5x less memory☆130Updated 3 years ago
- PyTorch implementation of L2L execution algorithm☆107Updated 2 years ago
- High performance distributed framework for training deep learning recommendation models based on PyTorch.☆402Updated this week
- Slicing a PyTorch Tensor Into Parallel Shards☆298Updated 3 years ago
- FTPipe and related pipeline model parallelism research.☆41Updated last year
- A GPU performance profiling tool for PyTorch models☆505Updated 3 years ago
- ☆248Updated 7 months ago
- A library of GPU kernels for sparse matrix operations.☆260Updated 4 years ago
- Time-based Sequence Model for Personalization and Recommendation Systems☆49Updated 3 years ago
- Practical low-rank gradient compression for distributed optimization: https://arxiv.org/abs/1905.13727☆146Updated 4 months ago
- Running BERT without Padding☆472Updated 3 years ago
- Fast Block Sparse Matrices for Pytorch☆546Updated 4 years ago
- A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind…☆155Updated 3 months ago
- Development repository for integrating FlexFlow (A distributed deep learning framework that supports flexible parallelization strategies)…☆28Updated 3 years ago
- Torch Distributed Experimental☆115Updated 7 months ago
- PyTorch RFCs (experimental)☆131Updated 6 months ago
- Enabling pure data parallel training of DLRM via caching and prefetching☆17Updated 3 years ago
- Fast sparse deep learning on CPUs☆52Updated 2 years ago
- HugeCTR is a high efficiency GPU framework designed for Click-Through-Rate (CTR) estimating training☆979Updated this week
- Differentiable Product Quantization for End-to-End Embedding Compression.☆60Updated 2 years ago
- Set of datasets for the deep learning recommendation model (DLRM).☆42Updated 2 years ago
- torch::deploy (multipy for non-torch uses) is a system that lets you get around the GIL problem by running multiple Python interpreters i…☆178Updated 3 months ago
- Distributed preprocessing and data loading for language datasets☆39Updated 11 months ago
- Fairring (FAIR + Herring) is a plug-in for PyTorch that provides a process group for distributed training that outperforms NCCL at large …☆64Updated 3 years ago