apple / ml-hypercloningLinks
☆47Updated 7 months ago
Alternatives and similar repositories for ml-hypercloning
Users that are interested in ml-hypercloning are comparing it to the libraries listed below
Sorting:
- ☆44Updated last year
- ☆80Updated last year
- ☆29Updated 6 months ago
- ☆47Updated 9 months ago
- some common Huggingface transformers in maximal update parametrization (µP)☆80Updated 3 years ago
- ☆78Updated 11 months ago
- Collection of autoregressive model implementation☆85Updated last month
- ☆58Updated 2 weeks ago
- MatFormer repo☆26Updated 5 months ago
- ☆55Updated 3 weeks ago
- Efficient encoder-decoder architecture for small language models (≤1B parameters) with cross-architecture knowledge distillation and visi…☆27Updated 3 months ago
- A repository for research on medium sized language models.☆76Updated last year
- This code repository contains the code used for my "Optimizing Memory Usage for Training LLMs and Vision Transformers in PyTorch" blog po…☆91Updated last year
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…☆60Updated last week
- An implementation of PSGD Kron second-order optimizer for PyTorch☆91Updated 2 months ago
- Utilities for Training Very Large Models☆58Updated 8 months ago
- ☆95Updated 4 months ago
- Load compute kernels from the Hub☆139Updated last week
- The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …☆59Updated 7 months ago
- ☆49Updated 7 months ago
- Learn CUDA with PyTorch☆21Updated this week
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training☆127Updated last year
- ☆49Updated last year
- Triton Implementation of HyperAttention Algorithm☆48Updated last year
- Code for NeurIPS LLM Efficiency Challenge☆58Updated last year
- ☆121Updated last month
- Source code for the collaborative reasoner research project at Meta FAIR.☆87Updated last month
- ☆33Updated 3 months ago
- ☆13Updated 3 weeks ago
- Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*☆82Updated last year