apple / ml-hypercloning
☆44Updated 3 months ago
Alternatives and similar repositories for ml-hypercloning:
Users that are interested in ml-hypercloning are comparing it to the libraries listed below
- ☆27Updated 3 months ago
- some common Huggingface transformers in maximal update parametrization (µP)☆78Updated 2 years ago
- One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation☆36Updated 4 months ago
- An implementation of PSGD Kron second-order optimizer for PyTorch☆82Updated last week
- ☆42Updated last year
- ☆75Updated 7 months ago
- ☆48Updated 3 months ago
- ☆78Updated 10 months ago
- ☆47Updated 5 months ago
- Collection of autoregressive model implementation☆81Updated this week
- Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*☆81Updated last year
- ☆53Updated last year
- Triton Implementation of HyperAttention Algorithm☆46Updated last year
- Code for NeurIPS LLM Efficiency Challenge☆55Updated 10 months ago
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆95Updated last month
- ☆121Updated last week
- ☆47Updated 2 months ago
- Train, tune, and infer Bamba model☆84Updated last month
- Set of scripts to finetune LLMs☆36Updated 10 months ago
- Experiments for efforts to train a new and improved t5☆77Updated 10 months ago
- ☆49Updated 11 months ago
- ☆33Updated 5 months ago
- Utilities for Training Very Large Models☆57Updated 4 months ago
- Improving Text Embedding of Language Models Using Contrastive Fine-tuning☆59Updated 6 months ago
- MEXMA: Token-level objectives improve sentence representations☆40Updated last month
- LLM training in simple, raw C/CUDA☆14Updated 2 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆54Updated 5 months ago
- ☆54Updated 3 months ago
- Fast, Modern, Memory Efficient, and Low Precision PyTorch Optimizers☆82Updated 7 months ago
- A fast implementation of T5/UL2 in PyTorch using Flash Attention☆82Updated 3 weeks ago