SparkJiao / llama-pipeline-parallelLinks
A prototype repo for hybrid training of pipeline parallel and distributed data parallel with comments on core code snippets. Feel free to copy code and launch discussions about the problems you have encoured.
☆55Updated last year
Alternatives and similar repositories for llama-pipeline-parallel
Users that are interested in llama-pipeline-parallel are comparing it to the libraries listed below
Sorting:
- [ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models☆78Updated last year
- ☆101Updated 8 months ago
- [ACL 2024] Long-Context Language Modeling with Parallel Encodings☆154Updated last year
- [ICLR 2025] 🧬 RegMix: Data Mixture as Regression for Language Model Pre-training (Spotlight)☆146Updated 4 months ago
- Train llm (bloom, llama, baichuan2-7b, chatglm3-6b) with deepspeed pipeline mode. Faster than zero/zero++/fsdp.☆96Updated last year
- [ICML 2024] Selecting High-Quality Data for Training Language Models☆176Updated last year
- [ACL-25] We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.☆63Updated 7 months ago
- code for Scaling Laws of RoPE-based Extrapolation☆73Updated last year
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"☆189Updated 3 months ago
- Repository of LV-Eval Benchmark☆67Updated 9 months ago
- [ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”☆121Updated 5 months ago
- Codes for our paper "Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation" (EMNLP 2023 Findings)☆42Updated last year
- [EMNLP 2024] LongAlign: A Recipe for Long Context Alignment of LLMs☆250Updated 6 months ago
- Research without Re-search: Maximal Update Parametrization Yields Accurate Loss Prediction across Scales☆32Updated last year
- ☆63Updated 6 months ago
- Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement fine-tuning (RFT) of large language models (…☆127Updated this week
- [ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning☆158Updated 9 months ago
- [EMNLP 2024] Source code for the paper "Learning Planning-based Reasoning with Trajectory Collection and Process Rewards Synthesizing".☆78Updated 5 months ago
- [ACL 2025, Main Conference] Intuitive Fine-Tuning: Towards Simplifying Alignment into a Single Process☆28Updated 10 months ago
- ☆18Updated 6 months ago
- Official repository for ACL 2025 paper "Model Extrapolation Expedites Alignment"☆73Updated last month
- [NeurIPS 2024] Fast Best-of-N Decoding via Speculative Rejection☆45Updated 7 months ago
- The source code of "Merging Experts into One: Improving Computational Efficiency of Mixture of Experts (EMNLP 2023)":☆39Updated last year
- CLongEval: A Chinese Benchmark for Evaluating Long-Context Large Language Models☆40Updated last year
- ACL 2024 | LooGLE: Long Context Evaluation for Long-Context Language Models☆185Updated 8 months ago
- Counting-Stars (★)☆83Updated 3 weeks ago
- Towards Systematic Measurement for Long Text Quality☆35Updated 9 months ago
- Code for paper "Patch-Level Training for Large Language Models"☆85Updated 7 months ago
- Reformatted Alignment☆113Updated 9 months ago
- train llama on a single A100 80G node using 🤗 transformers and 🚀 Deepspeed Pipeline Parallelism☆221Updated last year