SparkJiao / llama-pipeline-parallelLinks
A prototype repo for hybrid training of pipeline parallel and distributed data parallel with comments on core code snippets. Feel free to copy code and launch discussions about the problems you have encoured.
β56Updated 2 years ago
Alternatives and similar repositories for llama-pipeline-parallel
Users that are interested in llama-pipeline-parallel are comparing it to the libraries listed below
Sorting:
- [ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Modelsβ78Updated last year
- [ICLR 2025] 𧬠RegMix: Data Mixture as Regression for Language Model Pre-training (Spotlight)β151Updated 4 months ago
- [ACL 2024] Long-Context Language Modeling with Parallel Encodingsβ155Updated last year
- [EMNLP 2024] LongAlign: A Recipe for Long Context Alignment of LLMsβ250Updated 7 months ago
- β102Updated 9 months ago
- [ACL 2025, Main Conference, Oral] Intuitive Fine-Tuning: Towards Simplifying Alignment into a Single Processβ28Updated 11 months ago
- Train llm (bloom, llama, baichuan2-7b, chatglm3-6b) with deepspeed pipeline mode. Faster than zero/zero++/fsdp.β97Updated last year
- β107Updated last year
- [ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuningβ162Updated 3 weeks ago
- Counting-Stars (β )β83Updated last month
- train llama on a single A100 80G node using π€βtransformers and πβDeepspeed Pipeline Parallelismβ223Updated last year
- Implementation of NAACL 2024 Outstanding Paper "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"β146Updated 4 months ago
- ACL 2024 | LooGLE: Long Context Evaluation for Long-Context Language Modelsβ184Updated 9 months ago
- Rectified Rotary Position Embeddingsβ374Updated last year
- InsTag: A Tool for Data Analysis in LLM Supervised Fine-tuningβ263Updated last year
- β104Updated 2 years ago
- Repository of LV-Eval Benchmarkβ67Updated 10 months ago
- Research without Re-search: Maximal Update Parametrization Yields Accurate Loss Prediction across Scalesβ32Updated last year
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"β211Updated 4 months ago
- [ICML 2024] Selecting High-Quality Data for Training Language Modelsβ178Updated last year
- [ACL-25] We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.β63Updated 8 months ago
- LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Modelsβ76Updated 9 months ago
- Towards Systematic Measurement for Long Text Qualityβ36Updated 10 months ago
- A repository sharing the literatures about long-context large language models, including the methodologies and the evaluation benchmarksβ265Updated 11 months ago
- code for Scaling Laws of RoPE-based Extrapolationβ73Updated last year
- [ACL 2024] MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialoguesβ99Updated 11 months ago
- [ICML'2024] Can AI Assistants Know What They Don't Know?β81Updated last year
- Codes for our paper "Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation" (EMNLP 2023 Findings)β42Updated last year
- Codes and Data for Scaling Relationship on Learning Mathematical Reasoning with Large Language Modelsβ266Updated 10 months ago
- [ACL 2024 (Oral)] A Prospector of Long-Dependency Data for Large Language Modelsβ56Updated 11 months ago