intelligent-machine-learning / dlrover
DLRover: An Automatic Distributed Deep Learning System
☆1,272Updated this week
Related projects ⓘ
Alternatives and complementary repositories for dlrover
- The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.☆721Updated this week
- GLake: optimizing GPU memory management and IO transmission.☆379Updated 3 months ago
- Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.☆1,120Updated 3 months ago
- RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.☆545Updated last month
- A PyTorch Native LLM Training Framework☆665Updated 2 months ago
- FlagPerf is an open-source software platform for benchmarking AI chips.☆313Updated this week
- Best practice for training LLaMA models in Megatron-LM☆628Updated 10 months ago
- The road to hack SysML and become an system expert☆437Updated last month
- ☆289Updated this week
- Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.☆263Updated last year
- FlagScale is a large model toolkit based on open-sourced projects.☆169Updated this week
- 一种任务级GPU算力分时调度的高性能深度学习训练平台☆311Updated last year
- optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052☆457Updated 8 months ago
- ☆290Updated 4 months ago
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆1,893Updated last month
- Disaggregated serving system for Large Language Models (LLMs).☆359Updated 3 months ago
- veRL: Volcano Engine Reinforcement Learning for LLM☆318Updated this week
- Efficient Training (including pre-training and fine-tuning) for Big Models☆564Updated 3 months ago
- 📚Modern CUDA Learn Notes with PyTorch: Tensor/CUDA Cores, 📖150+ CUDA Kernels with PyTorch bindings, 📖HGEMM/SGEMM (95%~99% cuBLAS perfo…☆1,473Updated this week
- LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training☆390Updated this week
- FlashInfer: Kernel Library for LLM Serving☆1,452Updated this week
- LLM Inference benchmark☆350Updated 3 months ago
- ☆379Updated last week
- ☆144Updated this week
- Efficient AI Inference & Serving☆458Updated 10 months ago
- ☆197Updated last year
- Large Language Model (LLM) Systems Paper List☆645Updated this week
- how to optimize some algorithm in cuda.☆1,593Updated last week
- ☆114Updated last week
- ☆209Updated last year