bigscience-workshop / Megatron-DeepSpeedLinks

Ongoing research training transformer language models at scale, including: BERT & GPT-2

☆1,426

Alternatives and similar repositories for Megatron-DeepSpeed

Users that are interested in Megatron-DeepSpeed are comparing it to the libraries listed below

Sorting:

deepspeedai / Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
☆2,197Updated 3 months ago
bigscience-workshop / bigscience
Central place for the engineering/scaling WG: documentation, SLURM scripts and logs, compute environment and data.
☆1,006Updated last year
huggingface / transformers-bloom-inference
Fast Inference Solutions for BLOOM
☆564Updated last year
alibaba / Megatron-LLaMA
Best practice for training LLaMA models in Megatron-LM
☆663Updated last year
AetherCortex / Llama-X
Open Academic Research on Improving LLaMA to SOTA LLM
☆1,614Updated 2 years ago
deepspeedai / DeepSpeed-MII
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
☆2,080Updated 5 months ago
OpenBMB / BMTrain
Efficient Training (including pre-training and fine-tuning) for Big Models
☆613Updated last month
NVIDIA / NeMo-Aligner
Scalable toolkit for efficient model alignment
☆847Updated 2 months ago
epfLLM / Megatron-LLM
distributed trainer for LLMs
☆584Updated last year
IST-DASLab / gptq
Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
☆2,234Updated last year
alibaba / Pai-Megatron-Patch
The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.
☆1,457Updated 3 weeks ago
hao-ai-lab / LookaheadDecoding
[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
☆1,307Updated 9 months ago
google-research / FLAN
☆1,555Updated this week
GanjinZero / RRHF
[NIPS2023] RRHF & Wombat
☆811Updated 2 years ago
BlackSamorez / tensor_parallel
Automatically split your PyTorch models on multiple GPUs for training & inference
☆657Updated last year
AGI-Edgerunners / LLM-Adapters
Code for our EMNLP 2023 Paper: "LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models"
☆1,217Updated last year
jquesnelle / yarn
YaRN: Efficient Context Window Extension of Large Language Models
☆1,644Updated last year
OpenLMLab / LOMO
LOMO: LOw-Memory Optimization
☆991Updated last year
pjlab-sys4nlp / llama-moe
⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)
☆998Updated last year
bigscience-workshop / xmtf
Crosslingual Generalization through Multitask Finetuning
☆537Updated last year
bigscience-workshop / data-preparation
Code used for sourcing and cleaning the BigScience ROOTS corpus
☆317Updated 2 years ago
openai / lm-human-preferences
Code for the paper Fine-Tuning Language Models from Human Preferences
☆1,377Updated 2 years ago
princeton-nlp / LLM-Shearing
[ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
☆632Updated last year
hendrycks / test
Measuring Massive Multitask Language Understanding | ICLR 2021
☆1,527Updated 2 years ago
facebookresearch / cc_net
Tools to download and cleanup Common Crawl data
☆1,037Updated 2 years ago
microsoft / Tutel
Tutel MoE: Optimized Mixture-of-Experts Library, Support GptOss/DeepSeek/Kimi-K2/Qwen3 using FP8/NVFP4/MXFP4
☆945Updated last month
allenai / natural-instructions
Expanding natural instructions
☆1,026Updated last year
ruixiangcui / AGIEval
☆768Updated last year
OpenMOSS / CoLLiE
Collaborative Training of Large Language Models in an Efficient Way
☆417Updated last year
laekov / fastmoe
A fast MoE impl for PyTorch
☆1,822Updated 10 months ago