SparkJiao / llama-pipeline-parallel
A prototype repo for hybrid training of pipeline parallel and distributed data parallel with comments on core code snippets. Feel free to copy code and launch discussions about the problems you have encoured.
☆53Updated last year
Alternatives and similar repositories for llama-pipeline-parallel:
Users that are interested in llama-pipeline-parallel are comparing it to the libraries listed below
- [ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models☆75Updated 10 months ago
- ☆93Updated 3 months ago
- [ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning☆137Updated 4 months ago
- [ACL 2024] Long-Context Language Modeling with Parallel Encodings☆154Updated 7 months ago
- Train llm (bloom, llama, baichuan2-7b, chatglm3-6b) with deepspeed pipeline mode. Faster than zero/zero++/fsdp.☆92Updated 11 months ago
- [ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”☆119Updated this week
- ☆92Updated 9 months ago
- 🧬 RegMix: Data Mixture as Regression for Language Model Pre-training☆97Updated 3 months ago
- [EMNLP 2024] LongAlign: A Recipe for Long Context Alignment of LLMs☆237Updated last month
- Unofficial implementation of AlpaGasus☆90Updated last year
- We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.☆58Updated 2 months ago
- [ICML 2024] Selecting High-Quality Data for Training Language Models☆155Updated 6 months ago
- Repo for the EMNLP'24 Paper "Dual-Space Knowledge Distillation for Large Language Models".☆40Updated 2 months ago
- Research without Re-search: Maximal Update Parametrization Yields Accurate Loss Prediction across Scales☆31Updated last year
- Fantastic Data Engineering for Large Language Models☆64Updated 3 weeks ago
- Codes and Data for Scaling Relationship on Learning Mathematical Reasoning with Large Language Models☆233Updated 4 months ago
- An Experiment on Dynamic NTK Scaling RoPE☆62Updated last year
- [ACL 2024] MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialogues☆61Updated 5 months ago
- Official PyTorch implementation of DistiLLM: Towards Streamlined Distillation for Large Language Models (ICML 2024)☆171Updated 3 months ago
- code for Scaling Laws of RoPE-based Extrapolation☆71Updated last year
- Intuitive Fine-Tuning: Towards Simplifying Alignment into a Single Process☆23Updated 5 months ago
- Implementation of NAACL 2024 Outstanding Paper "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"☆135Updated 3 months ago
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"☆147Updated last month
- Official repository for paper "Weak-to-Strong Extrapolation Expedites Alignment"☆71Updated 7 months ago
- LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models☆73Updated 3 months ago
- [SIGIR'24] The official implementation code of MOELoRA.☆143Updated 5 months ago
- train llama on a single A100 80G node using 🤗 transformers and 🚀 Deepspeed Pipeline Parallelism☆212Updated last year
- 🐋 An unofficial implementation of Self-Alignment with Instruction Backtranslation.☆136Updated 6 months ago
- Codes for the paper "∞Bench: Extending Long Context Evaluation Beyond 100K Tokens": https://arxiv.org/abs/2402.13718☆303Updated 3 months ago
- Code for paper "Patch-Level Training for Large Language Models"☆75Updated 2 months ago