iwiwi / epochraftLinks

Checkpointable dataset utilities for foundation model training

☆32

Alternatives and similar repositories for epochraft

Users that are interested in epochraft are comparing it to the libraries listed below

Sorting:

kotoba-tech / kotomamba
Mamba training library developed by kotoba technologies
☆70Updated last year
iwiwi / epochraft-hf-fsdp
Example of using Epochraft to train HuggingFace transformers models with PyTorch FSDP
☆11Updated last year
cat-state / tinypar
☆20Updated 2 years ago
hadasah / btm
☆76Updated last year
huggingface / peft-pytorch-conference
Code for the examples presented in the talk "Training a Llama in your backyard: fine-tuning very large models on consumer hardware" given…
☆14Updated 2 years ago
apoorvkh / torchrunx
Easily run PyTorch on multiple GPUs & machines
☆52Updated this week
ryokamoi / llm-self-correction-papers
List of papers on Self-Correction of LLMs.
☆80Updated 10 months ago
Aratako / Task-Vector-Merge-Optimzier
☆15Updated last year
google-research-datasets / QAmeleon
QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning P…
☆35Updated 2 years ago
warner-benjamin / optimi
Fast, Modern, and Low Precision PyTorch Optimizers
☆116Updated 2 months ago
kyo-takano / chinchilla
A toolkit for scaling law research ⚖
☆53Updated 9 months ago
amirzandieh / HyperAttention
Triton Implementation of HyperAttention Algorithm
☆48Updated last year
lucidrains / memory-editable-transformer
My explorations into editing the knowledge and memories of an attention network
☆35Updated 2 years ago
yikangshen / megablocks
☆20Updated last year
eth-easl / fmengine
Utilities for Training Very Large Models
☆58Updated last year
shreyansh26 / Attention-Mask-Patterns
Using FlexAttention to compute attention with different masking patterns
☆47Updated last year
SakanaAI / CycleQD
CycleQD is a framework for parameter space model merging.
☆44Updated 9 months ago
sustcsonglin / gated_linear_attention_layer
☆31Updated last year
sunyt32 / torchscale
Transformers at any scale
☆41Updated last year
ahennequ / pytorch-custom-mma
☆29Updated 3 years ago
dangxingyu / rnn-icrag
Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"
☆27Updated last year
lucidrains / transformer-lm-gan
Explorations into adversarial losses on top of autoregressive loss for language modeling
☆38Updated 9 months ago
turingmotors / vlm-recipes
☆20Updated last year
EleutherAI / rnngineering
Engineering the state of RNN language models (Mamba, RWKV, etc.)
☆32Updated last year
epfml / schedules-and-scaling
Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"
☆85Updated last year
kotoba-tech / kotoba-recipes
Support Continual pre-training & Instruction Tuning forked from llama-recipes
☆33Updated last year
hitachi-nlp / FLD-corpus
☆18Updated 11 months ago
HazyResearch / prefix-linear-attention
☆57Updated last year
SakanaAI / TAID
Official implementation of "TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models"
☆118Updated last month
NathanGodey / headless-lm
Training and evaluation code for the paper "Headless Language Models: Learning without Predicting with Contrastive Weight Tying" (https:/…
☆28Updated last year