apple / ml-hypercloningLinks
☆52Updated last year
Alternatives and similar repositories for ml-hypercloning
Users that are interested in ml-hypercloning are comparing it to the libraries listed below
Sorting:
- some common Huggingface transformers in maximal update parametrization (µP)☆87Updated 3 years ago
- ☆47Updated last year
- ☆82Updated last year
- Collection of autoregressive model implementation☆85Updated 7 months ago
- ☆48Updated last year
- Experiments for efforts to train a new and improved t5☆76Updated last year
- A repository for research on medium sized language models.☆78Updated last year
- The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …☆60Updated last year
- Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*☆87Updated last year
- ☆50Updated last year
- Train, tune, and infer Bamba model☆137Updated 6 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆60Updated last year
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆103Updated 11 months ago
- ☆105Updated 4 months ago
- Google TPU optimizations for transformers models☆124Updated 10 months ago
- Code for NeurIPS LLM Efficiency Challenge☆59Updated last year
- Implementation of the Llama architecture with RLHF + Q-learning☆168Updated 10 months ago
- ☆58Updated 3 weeks ago
- MatFormer repo☆66Updated last year
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆132Updated last month
- Triton Implementation of HyperAttention Algorithm☆48Updated 2 years ago
- ☆138Updated 3 months ago
- ☆70Updated last year
- ☆55Updated last year
- Lightweight toolkit package to train and fine-tune 1.58bit Language models☆103Updated 6 months ago
- Train a SmolLM-style llm on fineweb-edu in JAX/Flax with an assortment of optimizers.☆18Updated 4 months ago
- ☆136Updated last year
- Set of scripts to finetune LLMs☆38Updated last year
- JORA: JAX Tensor-Parallel LoRA Library (ACL 2024)☆36Updated last year
- EvaByte: Efficient Byte-level Language Models at Scale☆111Updated 7 months ago