SakanaAI / evolutionary-model-mergeLinks
Official repository of Evolutionary Optimization of Model Merging Recipes
☆1,339Updated 6 months ago
Alternatives and similar repositories for evolutionary-model-merge
Users that are interested in evolutionary-model-merge are comparing it to the libraries listed below
Sorting:
- [ICLR 2025] Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling☆883Updated last month
- Implementation of the training framework proposed in Self-Rewarding Language Model, from MetaAI☆1,386Updated last year
- GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection☆1,572Updated 7 months ago
- The official implementation of Self-Play Fine-Tuning (SPIN)☆1,166Updated last year
- Codebase for Merging Language Models (ICML 2024)☆833Updated last year
- Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch☆1,842Updated 2 months ago
- Stanford NLP Python library for Representation Finetuning (ReFT)☆1,490Updated 4 months ago
- Code for Quiet-STaR☆734Updated 10 months ago
- Reaching LLaMA2 Performance with 0.1M Dollars☆983Updated 11 months ago
- MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering☆760Updated last week
- Stanford NLP Python library for understanding and improving PyTorch models via interventions☆756Updated 3 weeks ago
- Tools for merging pretrained large language models.☆5,853Updated last week
- Training LLMs with QLoRA + FSDP☆1,487Updated 7 months ago
- A family of open-sourced Mixture-of-Experts (MoE) Large Language Models☆1,545Updated last year
- [ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding☆1,258Updated 3 months ago
- A PyTorch native platform for training generative AI models☆3,953Updated this week
- Schedule-Free Optimization in PyTorch☆2,180Updated last month
- ☆1,025Updated 6 months ago
- Minimalistic large language model 3D-parallelism training☆1,942Updated this week
- Mamba-Chat: A chat LLM based on the state-space model architecture 🐍☆925Updated last year
- PyTorch implementation of Infini-Transformer from "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention…☆289Updated last year
- The official implementation of “Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training”☆963Updated last year
- ☆447Updated last year
- Code for BLT research paper☆1,686Updated last month
- [NeurIPS 2023] MeZO: Fine-Tuning Language Models with Just Forward Passes. https://arxiv.org/abs/2305.17333☆1,115Updated last year
- A bibliography and survey of the papers surrounding o1☆1,201Updated 7 months ago
- [ICML'24 Spotlight] LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning☆653Updated last year
- Official repository for ORPO☆455Updated last year
- A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).☆859Updated 2 weeks ago
- AllenAI's post-training codebase☆3,028Updated this week