intelligent-machine-learning / dlrover
DLRover: An Automatic Distributed Deep Learning System
☆1,321Updated this week
Alternatives and similar repositories for dlrover:
Users that are interested in dlrover are comparing it to the libraries listed below
- RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.☆607Updated last week
- GLake: optimizing GPU memory management and IO transmission.☆424Updated 2 months ago
- The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.☆807Updated this week
- FlagPerf is an open-source software platform for benchmarking AI chips.☆319Updated 3 weeks ago
- Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.☆2,443Updated this week
- Best practice for training LLaMA models in Megatron-LM☆641Updated last year
- Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.☆266Updated last year
- ☆311Updated last week
- A highly optimized LLM inference acceleration engine for Llama and its variants.☆835Updated this week
- A PyTorch Native LLM Training Framework☆698Updated last month
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆1,963Updated last month
- FlagScale is a large model toolkit based on open-sourced projects.☆209Updated this week
- 一种任务级GPU算力分时调度的高性能深度学习训练平台☆388Updated last year
- Disaggregated serving system for Large Language Models (LLMs).☆453Updated 5 months ago
- 📖A curated list of Awesome LLM/VLM Inference Papers with codes, such as FlashAttention, PagedAttention, Parallelism, etc. 🎉🎉☆3,307Updated this week
- Efficient Training (including pre-training and fine-tuning) for Big Models☆573Updated 6 months ago
- BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.☆836Updated 3 weeks ago
- The road to hack SysML and become an system expert☆462Updated 4 months ago
- FlagGems is an operator library for large language models implemented in Triton Language.☆406Updated this week
- Puck is a high-performance ANN search engine☆345Updated 2 months ago
- xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism☆1,192Updated this week
- FlashInfer: Kernel Library for LLM Serving☆1,876Updated this week
- AIFoundation 主要是指AI系统遇到大模型,从底层到上层如何系统级地支持大模型训练和推理 ,全栈的核心技术。☆759Updated this week
- A streamlined and customizable framework for efficient large model evaluation and performance benchmarking☆378Updated this week
- An industrial extension library of pytorch to accelerate large scale model training☆15Updated last week
- optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052☆469Updated 10 months ago
- ☆593Updated 5 months ago
- LLM Inference benchmark☆381Updated 6 months ago
- ☆598Updated 7 months ago
- ☆119Updated 2 months ago