iwiwi / epochraft-hf-fsdp
Example of using Epochraft to train HuggingFace transformers models with PyTorch FSDP
☆11Updated last year
Alternatives and similar repositories for epochraft-hf-fsdp
Users that are interested in epochraft-hf-fsdp are comparing it to the libraries listed below
Sorting:
- Checkpointable dataset utilities for foundation model training☆32Updated last year
- Support Continual pre-training & Instruction Tuning forked from llama-recipes☆32Updated last year
- Mamba training library developed by kotoba technologies☆69Updated last year
- LEIA: Facilitating Cross-Lingual Knowledge Transfer in Language Models with Entity-based Data Augmentation☆21Updated last year
- Ongoing Research Project for continaual pre-training LLM(dense mode)☆40Updated 2 months ago
- [ICLR 2025] SDTT: a simple and effective distillation method for discrete diffusion models☆24Updated last month
- ☆31Updated last year
- This repo is based on https://github.com/jiaweizzhao/GaLore☆27Updated 7 months ago
- CycleQD is a framework for parameter space model merging.☆39Updated 3 months ago
- ☆72Updated last year
- Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"☆27Updated last year
- ☆47Updated last year
- ☆52Updated 11 months ago
- The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …☆59Updated 7 months ago
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆51Updated 2 years ago
- Code for "Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes"☆28Updated last year
- Using FlexAttention to compute attention with different masking patterns☆43Updated 7 months ago
- Official implementation of "TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models"☆104Updated 3 months ago
- Swallowプロジェクト 大規模言語モデル 評価スクリプト☆17Updated last month
- ☆33Updated 9 months ago
- ☆60Updated 11 months ago
- List of papers on Self-Correction of LLMs.☆72Updated 4 months ago
- ☆16Updated 5 months ago
- Ongoing research training Mixture of Expert models.☆19Updated 7 months ago
- ☆14Updated last year
- "Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding" Zhenyu Zhang, Runjin Chen, Shiw…☆29Updated last year
- A toolkit for scaling law research ⚖☆49Updated 3 months ago
- Long Context Extension and Generalization in LLMs☆54Updated 7 months ago
- Japanese LLaMa experiment☆53Updated 5 months ago
- Here we will test various linear attention designs.☆60Updated last year