facebookresearch / FBTT-EmbeddingLinks
This is a Tensor Train based compression library to compress sparse embedding tables used in large-scale machine learning models such as recommendation and natural language processing. We showed this library can reduce the total model size by up to 100x in Facebook’s open sourced DLRM model while achieving same model quality. Our implementation …
☆194Updated 2 years ago
Alternatives and similar repositories for FBTT-Embedding
Users that are interested in FBTT-Embedding are comparing it to the libraries listed below
Sorting:
- http://vlsiarch.eecs.harvard.edu/research/recommendation/☆135Updated 2 years ago
- Research and development for optimizing transformers☆129Updated 4 years ago
- Training neural networks in TensorFlow 2.0 with 5x less memory☆132Updated 3 years ago
- High performance distributed framework for training deep learning recommendation models based on PyTorch.☆407Updated last week
- Simple Distributed Deep Learning on TensorFlow☆133Updated last week
- PyTorch implementation of L2L execution algorithm☆107Updated 2 years ago
- A tensor-aware point-to-point communication primitive for machine learning☆258Updated 2 years ago
- Running BERT without Padding☆471Updated 3 years ago
- Slicing a PyTorch Tensor Into Parallel Shards☆299Updated 2 weeks ago
- distributed-embeddings is a library for building large embedding based models in Tensorflow 2.☆44Updated last year
- A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind…☆157Updated this week
- Block-sparse primitives for PyTorch☆156Updated 4 years ago
- FTPipe and related pipeline model parallelism research.☆41Updated 2 years ago
- Fast Block Sparse Matrices for Pytorch☆547Updated 4 years ago
- ☆250Updated 11 months ago
- Efficient, check-pointed data loading for deep learning with massive data sets.☆208Updated 2 years ago
- A GPU performance profiling tool for PyTorch models☆503Updated 3 years ago
- A library for syntactically rewriting Python programs, pronounced (sinner).☆69Updated 3 years ago
- Implementation of a Transformer, but completely in Triton☆268Updated 3 years ago
- A library of GPU kernels for sparse matrix operations.☆265Updated 4 years ago
- A deep ranking personalization framework☆134Updated last year
- This repository contains the results and code for the MLPerf™ Training v0.7 benchmark.☆56Updated 2 years ago
- PyTorch RFCs (experimental)☆133Updated last month
- Time-based Sequence Model for Personalization and Recommendation Systems☆49Updated 3 years ago
- Practical low-rank gradient compression for distributed optimization: https://arxiv.org/abs/1905.13727☆147Updated 7 months ago
- PyTorch implementation of Hash Embeddings (NIPS 2017). Submission to the NIPS Implementation Challenge.☆197Updated 6 years ago
- End-to-end training of sparse deep neural networks with little-to-no performance loss.☆323Updated 2 years ago
- HugeCTR is a high efficiency GPU framework designed for Click-Through-Rate (CTR) estimating training☆1,013Updated 3 months ago
- Distributed ML Optimizer☆32Updated 3 years ago
- Pytorch Lightning Distributed Accelerators using Ray☆211Updated last year