alibaba / Pai-Megatron-PatchLinks

The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.

☆1,269

Alternatives and similar repositories for Pai-Megatron-Patch

Users that are interested in Pai-Megatron-Patch are comparing it to the libraries listed below

Sorting:

alibaba / Megatron-LLaMA
Best practice for training LLaMA models in Megatron-LM
☆659Updated last year
deepspeedai / Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
☆2,126Updated 3 weeks ago
FlagOpen / FlagScale
FlagScale is a large model toolkit based on open-sourced projects.
☆336Updated this week
alibaba / ROLL
An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models
☆1,605Updated last week
vllm-project / vllm-ascend
Community maintained hardware plugin for vLLM on Ascend
☆946Updated last week
feifeibear / LLMSpeculativeSampling
Fast inference from large lauguage models via speculative decoding
☆795Updated 11 months ago
alibaba / rtp-llm
RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
☆830Updated last week
inclusionAI / AReaL
Distributed RL System for LLM Reasoning
☆2,135Updated this week
OpenBMB / BMTrain
Efficient Training (including pre-training and fine-tuning) for Big Models
☆604Updated 2 months ago
THUDM / LongBench
LongBench v2 and LongBench (ACL 25'&24')
☆940Updated 6 months ago
pjlab-sys4nlp / llama-moe
⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)
☆978Updated 8 months ago
alibaba / ChatLearn
A flexible and efficient training framework for large-scale alignment tasks
☆400Updated this week
Unakar / Logic-RL
Reproduce R1 Zero on Logic Puzzle
☆2,384Updated 4 months ago
modelscope / evalscope
A streamlined and customizable framework for efficient large model evaluation and performance benchmarking
☆1,427Updated this week
feifeibear / long-context-attention
USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference
☆541Updated 3 weeks ago
bigscience-workshop / Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
☆1,406Updated last year
SafeAILab / EAGLE
Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3.
☆1,446Updated last week
THUDM / slime
slime is a LLM post-training framework aiming for RL Scaling.
☆1,113Updated this week
zhanshijinwat / Steel-LLM
Train a 1B LLM with 1T tokens from scratch by personal
☆707Updated 3 months ago
InternLM / InternEvo
InternEvo is an open-sourced lightweight training framework aims to support model pre-training without the need for extensive dependencie…
☆402Updated 2 weeks ago
Qihoo360 / Light-R1
☆733Updated 2 months ago
zhuzilin / ring-flash-attention
Ring attention implementation with flash attention
☆831Updated this week
Open-Reasoner-Zero / Open-Reasoner-Zero
Official Repo for Open-Reasoner-Zero
☆2,015Updated 2 months ago
volcengine / veScale
A PyTorch Native LLM Training Framework
☆839Updated 3 weeks ago
OpenMOSS / CoLLiE
Collaborative Training of Large Language Models in an Efficient Way
☆418Updated 11 months ago
zhaochenyang20 / Awesome-ML-SYS-Tutorial
My learning notes/codes for ML SYS.
☆3,153Updated last week
GAIR-NLP / O1-Journey
O1 Replication Journey
☆1,998Updated 6 months ago
alipay / PainlessInferenceAcceleration
Accelerate inference without tears
☆322Updated 4 months ago
mindspore-lab / mindformers
☆172Updated this week
openreasoner / openr
OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models
☆1,805Updated 6 months ago