HuangLK / transpeederView external linksLinks
train llama on a single A100 80G node using 🤗 transformers and 🚀 Deepspeed Pipeline Parallelism
☆224Nov 21, 2023Updated 2 years ago
Alternatives and similar repositories for transpeeder
Users that are interested in transpeeder are comparing it to the libraries listed below
Sorting:
- Train llm (bloom, llama, baichuan2-7b, chatglm3-6b) with deepspeed pipeline mode. Faster than zero/zero++/fsdp.☆98Feb 5, 2024Updated 2 years ago
- A prototype repo for hybrid training of pipeline parallel and distributed data parallel with comments on core code snippets. Feel free to…☆57Jul 4, 2023Updated 2 years ago
- ☆84Sep 9, 2023Updated 2 years ago
- The complete training code of the open-source high-performance Llama model, including the full process from pre-training to RLHF.☆68May 9, 2023Updated 2 years ago
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆19Jul 20, 2023Updated 2 years ago
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆1,433Mar 20, 2024Updated last year
- Collaborative Training of Large Language Models in an Efficient Way☆419Aug 28, 2024Updated last year
- Best practice for training LLaMA models in Megatron-LM☆664Jan 2, 2024Updated 2 years ago
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆2,224Aug 14, 2025Updated 5 months ago
- distributed trainer for LLMs☆588May 20, 2024Updated last year
- Automatically split your PyTorch models on multiple GPUs for training & inference☆655Jan 2, 2024Updated 2 years ago
- ☆16Mar 30, 2024Updated last year
- code for Scaling Laws of RoPE-based Extrapolation☆73Oct 16, 2023Updated 2 years ago
- [NeurIPS'22 Spotlight] A Contrastive Framework for Neural Text Generation☆475Mar 7, 2024Updated last year
- Code and documents of LongLoRA and LongAlpaca (ICLR 2024 Oral)☆2,696Aug 14, 2024Updated last year
- ☆17Aug 17, 2024Updated last year
- Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.☆752Sep 27, 2024Updated last year
- [NIPS2023] RRHF & Wombat☆808Sep 22, 2023Updated 2 years ago
- LLaMa Tuning with Stanford Alpaca Dataset using Deepspeed and Transformers☆50Mar 15, 2023Updated 2 years ago
- Example models using DeepSpeed☆6,785Updated this week
- Ongoing research training transformer language models at scale, including: BERT & GPT-2☆69Jul 20, 2023Updated 2 years ago
- Panda项目是于2023年5月启动的开源海外中文大语言模型项目,致力于大模型时代探索整个技术栈,旨在推动中文自然语言处理领域的创新和合作。☆1,039Oct 19, 2023Updated 2 years ago
- Secrets of RLHF in Large Language Models Part I: PPO☆1,416Mar 3, 2024Updated last year
- USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference☆643Jan 15, 2026Updated 3 weeks ago
- SNCSE: Contrastive Learning for Unsupervised Sentence Embedding with Soft Negative Samples☆76Sep 18, 2022Updated 3 years ago
- NSMC, KorSTS ... fine-tunings☆18Feb 23, 2022Updated 3 years ago
- "Why do I feel offended?" - Korean Dataset for Offensive Language Identification (EACL2023 Findings)☆15May 14, 2023Updated 2 years ago
- A large-scale 7B pretraining language model developed by BaiChuan-Inc.☆5,686Jul 18, 2024Updated last year
- BELLE: Be Everyone's Large Language model Engine(开源中文对话大模型)☆8,283Oct 16, 2024Updated last year
- Implementation of Chinese ChatGPT☆288Nov 20, 2023Updated 2 years ago
- ⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)☆1,003Dec 6, 2024Updated last year
- Chinese-LLaMA 1&2、Chinese-Falcon 基础模型;ChatFlow中文对话模型;中文OpenLLaMA模型;NLP预训练/指令微调数据集☆3,058Apr 14, 2024Updated last year
- Implementation of paper Data Engineering for Scaling Language Models to 128K Context☆484Mar 19, 2024Updated last year
- Code of the COLING22 paper "uChecker: Masked Pretrained Language Models as Unsupervised Chinese Spelling Checkers"☆19Aug 17, 2022Updated 3 years ago
- A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)☆4,742Jan 8, 2024Updated 2 years ago
- BLOOM 模型的指令微调☆24Jun 15, 2023Updated 2 years ago
- Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat☆117Jun 5, 2023Updated 2 years ago
- Ongoing research training transformer models at scale☆15,162Updated this week
- The Corpus & Code for EMNLP 2022 paper "FCGEC: Fine-Grained Corpus for Chinese Grammatical Error Correction" | FCGEC中文语法纠错语料及STG模型☆120Dec 10, 2024Updated last year