iwiwi / epochraft
Checkpointable dataset utilities for foundation model training
☆32Updated 9 months ago
Related projects ⓘ
Alternatives and complementary repositories for epochraft
- Mamba training library developed by kotoba technologies☆67Updated 8 months ago
- ☆20Updated last year
- Example of using Epochraft to train HuggingFace transformers models with PyTorch FSDP☆12Updated 9 months ago
- SDTT: a simple and effective distillation method for discrete diffusion models☆15Updated last week
- ☆71Updated 6 months ago
- Ongoing Research Project for continaual pre-training LLM(dense mode)☆26Updated last week
- ☆12Updated 5 months ago
- LEIA: Facilitating Cross-Lingual Knowledge Transfer in Language Models with Entity-based Data Augmentation☆21Updated 6 months ago
- ☆14Updated 6 months ago
- QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning P…☆34Updated last year
- ☆50Updated last week
- Memory Mosaics are networks of associative memories working in concert to achieve a prediction task.☆34Updated last month
- An implementation of "Subspace Representations for Soft Set Operations and Sentence Similarities" (NAACL 2024)☆10Updated 5 months ago
- Support Continual pre-training & Instruction Tuning forked from llama-recipes☆32Updated 8 months ago
- Codes and files for the paper Are Emergent Abilities in Large Language Models just In-Context Learning☆34Updated 7 months ago
- Triton Implementation of HyperAttention Algorithm☆46Updated 10 months ago
- Code for the examples presented in the talk "Training a Llama in your backyard: fine-tuning very large models on consumer hardware" given…☆14Updated last year
- ☆93Updated last year
- This repository contains code for cleaning your training data of benchmark data to help combat data snooping.☆25Updated last year
- ☆76Updated 5 months ago
- ☆24Updated this week
- Fast, Modern, Memory Efficient, and Low Precision PyTorch Optimizers☆58Updated 3 months ago
- ☆45Updated 9 months ago
- ☆20Updated this week
- ☆51Updated 4 months ago
- My explorations into editing the knowledge and memories of an attention network☆34Updated last year
- ☆38Updated 6 months ago
- A large-scale RWKV v6 inference with FLA . Capable of inference by combining multiple states(Pseudo MoE). Easy to deploy on docker. Suppo…☆16Updated last week
- Transformers at any scale☆41Updated 9 months ago