apple / ml-hypercloningLinks

☆52

Alternatives and similar repositories for ml-hypercloning

Users that are interested in ml-hypercloning are comparing it to the libraries listed below

Sorting:

SeunghyunSEO / optimized_hf_llama_class_for_training
☆48Updated last year
joey00072 / ohara
Collection of autoregressive model implementation
☆86Updated 6 months ago
microsoft / mutransformers
some common Huggingface transformers in maximal update parametrization (µP)
☆86Updated 3 years ago
google-deepmind / asyncdiloco
☆46Updated last year
devvrit / matformer
MatFormer repo
☆64Updated 10 months ago
epfml / DenseFormer
☆81Updated last year
facebookresearch / matrix
Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…
☆97Updated last week
Aleph-Alpha-Research / trigrams
☆57Updated 3 weeks ago
siyan-zhao / prepacking
The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …
☆60Updated last year
ContextualAI / CLAIR_and_APO
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment
☆60Updated last year
LucasPrietoAl / grokking-at-the-edge-of-numerical-stability
☆102Updated 3 months ago
tiiuae / onebitllms
Lightweight toolkit package to train and fine-tune 1.58bit Language models
☆95Updated 5 months ago
lucidrains / grokfast-pytorch
Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"
☆102Updated 10 months ago
kevinwu23 / StanfordFineTuneBench
☆31Updated 11 months ago
TRI-ML / linear_open_lm
A repository for research on medium sized language models.
☆78Updated last year
Zyphra / Zamba2
PyTorch implementation of models from the Zamba2 series.
☆185Updated 9 months ago
arcee-ai / DAM
☆55Updated 11 months ago
DeepAuto-AI / hip-attention
Training-free Post-training Efficient Sub-quadratic Complexity Attention. Implemented with OpenAI Triton.
☆147Updated 2 weeks ago
Upaya07 / NeurIPS-llm-efficiency-challenge
Code for NeurIPS LLM Efficiency Challenge
☆59Updated last year
huggingface / optimum-tpu
Google TPU optimizations for transformers models
☆121Updated 9 months ago
ml-jku / EVA
One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation
☆44Updated last week
amirzandieh / HyperAttention
Triton Implementation of HyperAttention Algorithm
☆48Updated last year
VatsaDev / NanoPoor
NanoGPT-speedrunning for the poor T4 enjoyers
☆72Updated 6 months ago
xrsrke / pipegoose
Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*
☆87Updated last year
euclaise / supertrainer2000
☆50Updated last year
cloneofsimo / min-fsdp
☆91Updated last year
minyoungg / LTE
☆69Updated last year
QuixiAI / spectrum
☆136Updated 2 months ago
s-smits / grpo-optuna
Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna
☆58Updated last week
huggingface / kernels
Load compute kernels from the Hub
☆308Updated this week