SparkJiao / llama-pipeline-parallelLinks
A prototype repo for hybrid training of pipeline parallel and distributed data parallel with comments on core code snippets. Feel free to copy code and launch discussions about the problems you have encoured.
☆57Updated 2 years ago
Alternatives and similar repositories for llama-pipeline-parallel
Users that are interested in llama-pipeline-parallel are comparing it to the libraries listed below
Sorting:
- [ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models☆78Updated last year
- [ACL 2024] Long-Context Language Modeling with Parallel Encodings☆167Updated last year
- [EMNLP 2024] LongAlign: A Recipe for Long Context Alignment of LLMs☆257Updated 11 months ago
- ☆108Updated 4 months ago
- [ICLR 2025] 🧬 RegMix: Data Mixture as Regression for Language Model Pre-training (Spotlight)☆181Updated 9 months ago
- ☆122Updated last year
- [ICML 2024] Selecting High-Quality Data for Training Language Models☆194Updated last year
- Train llm (bloom, llama, baichuan2-7b, chatglm3-6b) with deepspeed pipeline mode. Faster than zero/zero++/fsdp.☆98Updated last year
- [ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning☆184Updated 5 months ago
- [ACL 2025, Main Conference, Oral] Intuitive Fine-Tuning: Towards Simplifying Alignment into a Single Process☆30Updated last year
- [ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”☆126Updated 10 months ago
- Research without Re-search: Maximal Update Parametrization Yields Accurate Loss Prediction across Scales☆32Updated 2 years ago
- Counting-Stars (★)☆83Updated 2 weeks ago
- Code for paper "Patch-Level Training for Large Language Models"☆95Updated last month
- ☆105Updated 2 years ago
- Implementation of NAACL 2024 Outstanding Paper "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"☆152Updated 8 months ago
- Unofficial implementation of AlpaGasus☆93Updated 2 years ago
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"☆240Updated 2 months ago
- [ACL 2025] We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLM…☆68Updated last year
- Rectified Rotary Position Embeddings☆384Updated last year
- code for Scaling Laws of RoPE-based Extrapolation☆73Updated 2 years ago
- Towards Systematic Measurement for Long Text Quality☆37Updated last year
- train llama on a single A100 80G node using 🤗 transformers and 🚀 Deepspeed Pipeline Parallelism☆225Updated 2 years ago
- ACL 2024 | LooGLE: Long Context Evaluation for Long-Context Language Models☆192Updated last year
- Repository of LV-Eval Benchmark☆72Updated last year
- Codes and Data for Scaling Relationship on Learning Mathematical Reasoning with Large Language Models☆268Updated last year
- [NeurIPS 2024] Fast Best-of-N Decoding via Speculative Rejection☆52Updated last year
- Repo for the EMNLP'24 Paper "Dual-Space Knowledge Distillation for Large Language Models". A general white-box KD framework for both same…☆60Updated 3 months ago
- [ACL 2024 (Oral)] A Prospector of Long-Dependency Data for Large Language Models☆58Updated last year
- Official implementation of the paper "From Complex to Simple: Enhancing Multi-Constraint Complex Instruction Following Ability of Large L…☆51Updated last year