NVIDIA-Merlin / HierarchicalKVLinks

HierarchicalKV is a part of NVIDIA Merlin and provides hierarchical key-value storage to meet RecSys requirements. The key capability of HierarchicalKV is to store key-value feature-embeddings on high-bandwidth memory (HBM) of GPUs and in host memory. It also can be used as a generic key-value storage.

☆173

Alternatives and similar repositories for HierarchicalKV

Users that are interested in HierarchicalKV are comparing it to the libraries listed below

Sorting:

DeepRec-AI / HybridBackend
A high-performance framework for training wide-and-deep recommender systems on heterogeneous cluster
☆159Updated last year
alibaba / TePDist
TePDist (TEnsor Program DISTributed) is an HLO-level automatic distributed system for DL models.
☆97Updated 2 years ago
google / nccl-fastsocket
NCCL Fast Socket is a transport layer plugin to improve NCCL collective communication performance on Google Cloud.
☆121Updated last year
eniac / paella
Paella: Low-latency Model Serving with Virtualized GPU Scheduling
☆62Updated last year
AlibabaPAI / DAPPLE
An Efficient Pipelined Data Parallel Approach for Training Large Model
☆76Updated 4 years ago
UofT-EcoSystem / hotline
☆32Updated 2 years ago
AlibabaPAI / torchacc
PyTorch distributed training acceleration framework
☆53Updated 2 months ago
parasailteam / coconet
☆83Updated 2 years ago
alibaba / EasyParallelLibrary
Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.
☆269Updated 2 years ago
Funatiq / gossip
gossip: Efficient Communication Primitives for Multi-GPU Systems
☆59Updated 3 years ago
microsoft / mscclpp
MSCCL++: A GPU-driven communication stack for scalable AI applications
☆425Updated this week
quiver-team / quiver-feature
High performance RDMA-based distributed feature collection component for training GNN model on EXTREMELY large graph
☆55Updated 3 years ago
microsoft / msccl
Microsoft Collective Communication Library
☆367Updated 2 years ago
NVIDIA / recsys-examples
Examples for Recommenders - easy to train and deploy on accelerated infrastructure.
☆154Updated this week
AlibabaResearch / recom
An Optimizing Compiler for Recommendation Model Inference
☆26Updated 4 months ago
bytedance / ps-lite
A lightweight parameter server interface
☆83Updated 2 years ago
NVIDIA-Merlin / distributed-embeddings
distributed-embeddings is a library for building large embedding based models in Tensorflow 2.
☆46Updated 2 years ago
thu-pacman / PET
PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections
☆122Updated 3 years ago
tlc-pack / relax
☆193Updated 2 years ago
microsoft / NPKit
NCCL Profiling Kit
☆145Updated last year
Mellanox / nccl-rdma-sharp-plugins
RDMA and SHARP plugins for nccl library
☆209Updated last month
OpenPPL / ppl.llm.kernel.cuda
☆150Updated 9 months ago
infinigence / Semi-PD
A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.
☆112Updated 5 months ago
apache / tvm-rfcs
A home for the final text of all TVM RFCs.
☆108Updated last year
anilshanbhag / gpu-topk
Efficient Top-K implementation on the GPU
☆188Updated 6 years ago
ConnollyLeon / awesome-Auto-Parallelism
A baseline repository of Auto-Parallelism in Training Neural Networks
☆147Updated 3 years ago
infinigence / FlashOverlap
A lightweight design for computation-communication overlap.
☆181Updated last week
bytedance / ByteMLPerf
AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and ver…
☆265Updated 2 months ago
ColfaxResearch / cfx-article-src
☆150Updated 5 months ago
yalue / cuda_scheduling_examiner_mirror
A tool for examining GPU scheduling behavior.
☆88Updated last year