apple / ml-hypercloningLinks
☆47Updated 7 months ago
Alternatives and similar repositories for ml-hypercloning
Users that are interested in ml-hypercloning are comparing it to the libraries listed below
Sorting:
- Triton Implementation of HyperAttention Algorithm☆48Updated last year
- some common Huggingface transformers in maximal update parametrization (µP)☆81Updated 3 years ago
- ☆81Updated last year
- The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …☆59Updated 8 months ago
- ☆47Updated 9 months ago
- Collection of autoregressive model implementation☆85Updated 2 months ago
- ☆44Updated last year
- Code for NeurIPS LLM Efficiency Challenge☆59Updated last year
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…☆71Updated this week
- Experiments for efforts to train a new and improved t5☆77Updated last year
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆57Updated 9 months ago
- Using FlexAttention to compute attention with different masking patterns☆44Updated 9 months ago
- ☆49Updated last year
- ☆56Updated last month
- Efficient encoder-decoder architecture for small language models (≤1B parameters) with cross-architecture knowledge distillation and visi…☆27Updated 4 months ago
- ☆61Updated 3 weeks ago
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.☆36Updated last year
- DPO, but faster 🚀☆43Updated 6 months ago
- Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*☆84Updated last year
- ☆51Updated 7 months ago
- MatFormer repo☆31Updated 6 months ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆126Updated 6 months ago
- ☆29Updated 5 months ago
- Small and Efficient Mathematical Reasoning LLMs☆71Updated last year
- Utilities for Training Very Large Models☆58Updated 9 months ago
- ☆78Updated 11 months ago
- Simple GRPO scripts and configurations.☆58Updated 4 months ago
- Lightweight toolkit package to train and fine-tune 1.58bit Language models☆80Updated last month
- ☆133Updated 10 months ago
- Simple repository for training small reasoning models☆33Updated 4 months ago