SparkJiao / llama-pipeline-parallel
A prototype repo for hybrid training of pipeline parallel and distributed data parallel with comments on core code snippets. Feel free to copy code and launch discussions about the problems you have encoured.
☆49Updated last year
Related projects ⓘ
Alternatives and complementary repositories for llama-pipeline-parallel
- ☆88Updated last month
- [ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models☆73Updated 8 months ago
- [ACL 2024] Long-Context Language Modeling with Parallel Encodings☆144Updated 5 months ago
- [EMNLP 2024] LongAlign: A Recipe for Long Context Alignment of LLMs☆217Updated 6 months ago
- 🧬 RegMix: Data Mixture as Regression for Language Model Pre-training☆88Updated last month
- [ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning☆125Updated 2 months ago
- Train llm (bloom, llama, baichuan2-7b, chatglm3-6b) with deepspeed pipeline mode. Faster than zero/zero++/fsdp.☆90Updated 9 months ago
- Official PyTorch implementation of DistiLLM: Towards Streamlined Distillation for Large Language Models (ICML 2024)☆138Updated 2 months ago
- [ACL 2024] MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialogues☆51Updated 3 months ago
- [ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”☆118Updated 4 months ago
- Research without Re-search: Maximal Update Parametrization Yields Accurate Loss Prediction across Scales☆30Updated last year
- CLongEval: A Chinese Benchmark for Evaluating Long-Context Large Language Models☆38Updated 8 months ago
- code for Scaling Laws of RoPE-based Extrapolation☆70Updated last year
- Repo for the EMNLP'24 Paper "Dual-Space Knowledge Distillation for Large Language Models".☆37Updated 2 weeks ago
- ☆89Updated 7 months ago
- Ouroboros: Speculative Decoding with Large Model Enhanced Drafting (EMNLP 2024 main)☆76Updated last month
- [ICML 2024] Selecting High-Quality Data for Training Language Models☆145Updated 5 months ago
- Unofficial implementation of AlpaGasus☆84Updated last year
- ACL 2024 | LooGLE: Long Context Evaluation for Long-Context Language Models☆167Updated last month
- We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.☆51Updated 3 weeks ago
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"☆118Updated 3 weeks ago
- ☆40Updated 5 months ago
- Fantastic Data Engineering for Large Language Models☆50Updated 3 months ago
- Code and Data for "Long-context LLMs Struggle with Long In-context Learning"☆91Updated 4 months ago
- Implementations of online merging optimizers proposed by Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment☆66Updated 5 months ago
- ☆53Updated 4 months ago
- Counting-Stars (★)☆76Updated 2 months ago
- Repository of LV-Eval Benchmark☆48Updated 2 months ago
- Towards Systematic Measurement for Long Text Quality☆28Updated 2 months ago
- Codes for the paper "∞Bench: Extending Long Context Evaluation Beyond 100K Tokens": https://arxiv.org/abs/2402.13718☆285Updated last month