apple / ml-hypercloning
☆47Updated 6 months ago
Alternatives and similar repositories for ml-hypercloning:
Users that are interested in ml-hypercloning are comparing it to the libraries listed below
- ☆47Updated 8 months ago
- ☆43Updated last year
- ☆48Updated 6 months ago
- some common Huggingface transformers in maximal update parametrization (µP)☆80Updated 3 years ago
- ☆54Updated 8 months ago
- ☆81Updated last year
- Code for NeurIPS LLM Efficiency Challenge☆57Updated last year
- Efficient encoder-decoder architecture for small language models (≤1B parameters) with cross-architecture knowledge distillation and visi…☆23Updated 3 months ago
- ☆49Updated last year
- Simple repository for training small reasoning models☆27Updated 3 months ago
- This repo is based on https://github.com/jiaweizzhao/GaLore☆27Updated 7 months ago
- Set of scripts to finetune LLMs☆37Updated last year
- ☆53Updated last year
- Collection of autoregressive model implementation☆85Updated 2 weeks ago
- The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …☆59Updated 6 months ago
- ☆94Updated 3 months ago
- Experiments for efforts to train a new and improved t5☆77Updated last year
- NanoGPT (124M) quality in 2.67B tokens☆28Updated this week
- A repository for research on medium sized language models.☆76Updated 11 months ago
- One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation☆39Updated 6 months ago
- Simple GRPO scripts and configurations.☆58Updated 3 months ago
- Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Fl…☆72Updated 8 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆114Updated this week
- MEXMA: Token-level objectives improve sentence representations☆41Updated 4 months ago
- ☆28Updated 5 months ago
- ☆78Updated 10 months ago
- ☆110Updated 5 months ago
- ☆33Updated 10 months ago
- prime-rl is a codebase for decentralized RL training at scale☆89Updated this week
- Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"☆37Updated last year