daochenzha / neuroshard
[MLSys 2023] Pre-train and Search: Efficient Embedding Table Sharding with Pre-trained Neural Cost Models
☆16Updated last year
Related projects ⓘ
Alternatives and complementary repositories for neuroshard
- [KDD 2022] AutoShard: Automated Embedding Table Sharding for Recommender Systems☆21Updated last year
- [NeurIPS 2022] DreamShard: Generalizable Embedding Table Placement for Recommender Systems☆28Updated last year
- [EMNLP 2024 Main] Virtual Personas for Language Models via an Anthology of Backstories☆18Updated this week
- Modular and structured prompt caching for low-latency LLM inference☆68Updated last week
- ☆29Updated 3 months ago
- Largest realworld open-source graph dataset - Worked done under IBM-Illinois Discovery Accelerator Institute and Amazon Research Awards a…☆76Updated 2 months ago
- ☆18Updated 2 years ago
- The official SALIENT system described in the paper "Accelerating Training and Inference of Graph Neural Networks with Fast Sampling and P…☆38Updated last year
- Set of datasets for the deep learning recommendation model (DLRM).☆41Updated last year
- PyTorch-Direct code on top of PyTorch-1.8.0nightly (e152ca5) for Large Graph Convolutional Network Training with GPU-Oriented Data Commun…☆45Updated last year
- Retrieval with Learned Similarities☆15Updated this week
- ☆72Updated 3 years ago
- PyTorch implementation of paper "Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline".☆74Updated last year
- Accelerating Recommender model training by leveraging popular choices -- VLDB 2022☆29Updated 2 months ago
- Distributed Deep Graph Learning Framework for Dynamic Graphs☆11Updated 7 months ago
- ICLR 2021☆44Updated 3 years ago
- ☆38Updated 4 months ago
- Fast Parallel Probabilistic Graphical Model Learning and Inference [IPDPS'22, PPoPP'23, USENIX ATC'24]☆41Updated this week
- ☆46Updated 5 months ago
- Graphiler is a compiler stack built on top of DGL and TorchScript which compiles GNNs defined using user-defined functions (UDFs) into ef…☆60Updated 2 years ago
- ☆19Updated last year
- Surrogate-based Hyperparameter Tuning System☆27Updated last year
- TensorRT LLM Benchmark Configuration☆11Updated 3 months ago
- PipeRAG: Fast Retrieval-Augmented Generation via Algorithm-System Co-design (KDD 2025)☆11Updated 5 months ago
- GraphMineSuite (GMS): a benchmarking suite for graph mining algorithms such as graph pattern matching or graph learning☆25Updated 3 years ago
- [ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Prompts☆34Updated 8 months ago
- Official implementation for Yuan & Liu & Zhong et al., KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark o…☆50Updated last month
- LLM Serving Performance Evaluation Harness☆56Updated 2 months ago
- TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attention☆21Updated last month
- A high-performance distributed deep learning system targeting large-scale and automated distributed training. If you have any interests, …☆104Updated 11 months ago