PanZaifeng / RecFlexLinks
A recommendation model kernel optimizing system
☆10Updated 2 months ago
Alternatives and similar repositories for RecFlex
Users that are interested in RecFlex are comparing it to the libraries listed below
Sorting:
- ☆23Updated 5 months ago
- An Optimizing Compiler for Recommendation Model Inference☆25Updated 2 months ago
- (NeurIPS 2022) Automatically finding good model-parallel strategies, especially for complex models and clusters.☆40Updated 2 years ago
- Artifacts of EVT ASPLOS'24☆26Updated last year
- Artifact for OSDI'23: MGG: Accelerating Graph Neural Networks with Fine-grained intra-kernel Communication-Computation Pipelining on Mult…☆40Updated last year
- nnScaler: Compiling DNN models for Parallel Training☆115Updated last week
- ☆25Updated 2 years ago
- Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity☆217Updated last year
- An experimental parallel training platform☆54Updated last year
- A lightweight design for computation-communication overlap.☆160Updated this week
- Chimera: bidirectional pipeline parallelism for efficiently training large-scale models.☆66Updated 5 months ago
- ☆75Updated 4 years ago
- ☆69Updated last year
- Compiler for Dynamic Neural Networks☆46Updated last year
- ☆40Updated 4 months ago
- Supplemental materials for The ASPLOS 2025 / EuroSys 2025 Contest on Intra-Operator Parallelism for Distributed Deep Learning☆23Updated 3 months ago
- Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.☆89Updated 2 years ago
- ☆150Updated last year
- [ICML 2024] Serving LLMs on heterogeneous decentralized clusters.☆27Updated last year
- Microsoft Collective Communication Library☆67Updated 9 months ago
- An Attention Superoptimizer☆22Updated 7 months ago
- ☆81Updated 2 years ago
- Scalable long-context LLM decoding that leverages sparsity—by treating the KV cache as a vector storage system.☆75Updated last week
- ☆41Updated last year
- InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management (OSDI'24)☆151Updated last year
- [IJCAI2023] An automated parallel training system that combines the advantages from both data and model parallelism. If you have any inte…☆52Updated 2 years ago
- ☆17Updated 2 years ago
- Official repository for the paper DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines☆20Updated last year
- A GPU-optimized system for efficient long-context LLMs decoding with low-bit KV cache.☆57Updated 3 weeks ago
- PyTorch-Based Fast and Efficient Processing for Various Machine Learning Applications with Diverse Sparsity☆114Updated 2 weeks ago