train llama on a single A100 80G node using 🤗 transformers and 🚀 Deepspeed Pipeline Parallelism
☆224Nov 21, 2023Updated 2 years ago
Alternatives and similar repositories for transpeeder
Users that are interested in transpeeder are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Train llm (bloom, llama, baichuan2-7b, chatglm3-6b) with deepspeed pipeline mode. Faster than zero/zero++/fsdp.☆97Feb 5, 2024Updated 2 years ago
- A prototype repo for hybrid training of pipeline parallel and distributed data parallel with comments on core code snippets. Feel free to…☆58Jul 4, 2023Updated 2 years ago
- ☆84Sep 9, 2023Updated 2 years ago
- The complete training code of the open-source high-performance Llama model, including the full process from pre-training to RLHF.☆69May 9, 2023Updated 3 years ago
- LLaMa Tuning with Stanford Alpaca Dataset using Deepspeed and Transformers☆49Mar 15, 2023Updated 3 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆19Jul 20, 2023Updated 2 years ago
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆1,446Mar 20, 2024Updated 2 years ago
- Best practice for training LLaMA models in Megatron-LM☆665Jan 2, 2024Updated 2 years ago
- Automatically split your PyTorch models on multiple GPUs for training & inference☆656Jan 2, 2024Updated 2 years ago
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆2,253Aug 14, 2025Updated 10 months ago
- Collaborative Training of Large Language Models in an Efficient Way☆420Aug 28, 2024Updated last year
- distributed trainer for LLMs☆589May 20, 2024Updated 2 years ago
- code for Scaling Laws of RoPE-based Extrapolation☆73Oct 16, 2023Updated 2 years ago
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆69Jul 20, 2023Updated 2 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Code and documents of LongLoRA and LongAlpaca (ICLR 2024 Oral)☆2,689Aug 14, 2024Updated last year
- Panda项目是于2023年5月启动的开源海外中文大语言模型项目,致力于大模型时代探索整个技术栈,旨在推动中文自然语 言处理领域的创新和合作。☆1,033Oct 19, 2023Updated 2 years ago
- Example models using DeepSpeed☆6,823May 20, 2026Updated last month
- Secrets of RLHF in Large Language Models Part I: PPO☆1,426Mar 3, 2024Updated 2 years ago
- Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat☆117Jun 5, 2023Updated 3 years ago
- [NIPS2023] RRHF & Wombat☆806Sep 22, 2023Updated 2 years ago
- USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference☆673May 21, 2026Updated last month
- Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.☆760Sep 27, 2024Updated last year
- BELLE: Be Everyone's Large Language model Engine(开源中文对话大模型)☆8,272Oct 16, 2024Updated last year
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- [NeurIPS'22 Spotlight] A Contrastive Framework for Neural Text Generation☆478Mar 7, 2024Updated 2 years ago
- Chinese-LLaMA 1&2、Chinese-Falcon 基础模型;ChatFlow中文对话模型;中文OpenLLaMA模型;NLP预训练/指令微调数据集☆3,050Apr 14, 2024Updated 2 years ago
- Finetuning LLaMA with DeepSpeed☆10Apr 14, 2023Updated 3 years ago
- YaRN: Efficient Context Window Extension of Large Language Models☆1,728Apr 17, 2024Updated 2 years ago
- A large-scale 7B pretraining language model developed by BaiChuan-Inc.☆5,654Jul 18, 2024Updated last year
- Pipeline Parallelism for PyTorch☆785Aug 21, 2024Updated last year
- Ongoing research training transformer models at scale☆16,761Updated this week
- A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)☆4,749Jan 8, 2024Updated 2 years ago
- ⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)☆1,003Dec 6, 2024Updated last year
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Implementation of paper Data Engineering for Scaling Language Models to 128K Context☆497Mar 19, 2024Updated 2 years ago
- Mixture of Expert (MoE) techniques for enhancing LLM performance through expert-driven prompt mapping and adapter combinations.☆12Feb 11, 2024Updated 2 years ago
- Implementation of Chinese ChatGPT☆287Nov 20, 2023Updated 2 years ago
- [ICML'24] Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"☆450Oct 16, 2024Updated last year
- An implementation of an autoregressive language model using an improved Transformer and DeepSpeed pipeline parallelism.☆29Jan 12, 2026Updated 5 months ago
- SNCSE: Contrastive Learning for Unsupervised Sentence Embedding with Soft Negative Samples☆76Sep 18, 2022Updated 3 years ago
- 万卷1.0多模态语料☆573Oct 20, 2023Updated 2 years ago