iwiwi / epochraft-hf-fsdpLinks

Example of using Epochraft to train HuggingFace transformers models with PyTorch FSDP

☆11

Alternatives and similar repositories for epochraft-hf-fsdp

Users that are interested in epochraft-hf-fsdp are comparing it to the libraries listed below

Sorting:

kotoba-tech / kotoba-recipes
Support Continual pre-training & Instruction Tuning forked from llama-recipes
☆32Updated last year
iwiwi / epochraft
Checkpointable dataset utilities for foundation model training
☆32Updated last year
kotoba-tech / kotomamba
Mamba training library developed by kotoba technologies
☆71Updated last year
leia-llm / leia
LEIA: Facilitating Cross-Lingual Knowledge Transfer in Language Models with Entity-based Data Augmentation
☆21Updated last year
PythonNut / superbpe
Official code release for "SuperBPE: Space Travel for Language Models"
☆61Updated this week
berlino / seq_icl
☆53Updated last year
epfml / schedules-and-scaling
Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"
☆75Updated 8 months ago
Edward-Sun / gpt-accelera
Simple and efficient pytorch-native transformer training and inference (batched)
☆77Updated last year
okoge-kaz / moe-recipes
Ongoing research training Mixture of Expert models.
☆19Updated 10 months ago
sustcsonglin / mamba-triton
☆48Updated last year
SakanaAI / TAID
Official implementation of "TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models"
☆111Updated 5 months ago
Leooyii / LCEG
Long Context Extension and Generalization in LLMs
☆57Updated 9 months ago
kyo-takano / chinchilla
A toolkit for scaling law research ⚖
☆50Updated 5 months ago
swallow-llm / swallow-evaluation
Swallowプロジェクト大規模言語モデル評価スクリプト
☆19Updated 3 months ago
GSYfate / knnlm-limits
Official code repo for paper "Great Memory, Shallow Reasoning: Limits of kNN-LMs"
☆23Updated 2 months ago
JonasGeiping / linear_cross_entropy_loss
A fusion of a linear layer and a cross entropy loss, written for pytorch in triton.
☆68Updated 11 months ago
shawntan / stickbreaking-attention
Stick-breaking attention
☆58Updated 2 weeks ago
hadasah / btm
☆75Updated last year
sjelassi / transformers_ssm_copy
☆32Updated last year
bminixhofer / zett
Code for Zero-Shot Tokenizer Transfer
☆133Updated 6 months ago
mgmalek / efficient_cross_entropy
☆112Updated last year
PKU-ML / LongPPL
Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"
☆91Updated 2 months ago
google-deepmind / randomized_positional_encodings
Randomized Positional Encodings Boost Length Generalization of Transformers
☆82Updated last year
siyan-zhao / prepacking
The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …
☆59Updated 9 months ago
ku-nlp / ja-vicuna-qa-benchmark
☆33Updated 11 months ago
insuhan / hyper-attn
☆81Updated last year
Doraemonzzz / Awesome-Triton-Resources
Awesome Triton Resources
☆32Updated 2 months ago
kyleliang919 / Online-Subspace-Descent
[NeurIPS 2024] Low rank memory efficient optimizer without SVD
☆30Updated 2 weeks ago
mlfoundations / scaling
Language models scale reliably with over-training and on downstream tasks
☆97Updated last year
okoge-kaz / llm-recipes
Ongoing Research Project for continaual pre-training LLM(dense mode)
☆42Updated 4 months ago