intelligent-machine-learning / dlrover
DLRover: An Automatic Distributed Deep Learning System
☆1,441Updated this week
Alternatives and similar repositories for dlrover
Users that are interested in dlrover are comparing it to the libraries listed below
Sorting:
- RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.☆749Updated this week
- The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.☆1,082Updated this week
- GLake: optimizing GPU memory management and IO transmission.☆460Updated last month
- Best practice for training LLaMA models in Megatron-LM☆650Updated last year
- A PyTorch Native LLM Training Framework☆806Updated 4 months ago
- A highly optimized LLM inference acceleration engine for Llama and its variants.☆886Updated this week
- 一种任务级GPU算力分时调度的高性能深度学习训练平台☆643Updated last year
- FlagScale is a large model toolkit based on open-sourced projects.☆276Updated this week
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆2,071Updated last month
- Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.☆3,276Updated this week
- FlagPerf is an open-source software platform for benchmarking AI chips.☆331Updated this week
- A self-learning tutorail for CUDA High Performance Programing.☆628Updated last month
- A fast communication-overlapping library for tensor/expert parallelism on GPUs.☆928Updated last month
- Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.☆267Updated 2 years ago
- Community maintained hardware plugin for vLLM on Ascend☆631Updated this week
- ☆330Updated 3 months ago
- My learning notes/codes for ML SYS.☆2,184Updated this week
- Distributed RL System for LLM Reasoning☆1,248Updated 2 weeks ago
- NCCL Tests☆1,106Updated last week
- Disaggregated serving system for Large Language Models (LLMs).☆584Updated last month
- BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.☆864Updated 4 months ago
- The road to hack SysML and become an system expert☆483Updated 7 months ago
- optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052☆473Updated last year
- A flexible and efficient training framework for large-scale alignment tasks☆346Updated 3 months ago
- how to optimize some algorithm in cuda.☆2,162Updated this week
- A streamlined and customizable framework for efficient large model evaluation and performance benchmarking☆961Updated this week
- FlashInfer: Kernel Library for LLM Serving☆2,966Updated this week
- ☆529Updated 11 months ago
- Efficient Training (including pre-training and fine-tuning) for Big Models☆589Updated this week
- LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training☆403Updated last week