tommasomncttn / mergeneticLinks
Flexible library for merging large language models (LLMs) via evolutionary optimization (ACL 2025 Demo).
☆88Updated 2 months ago
Alternatives and similar repositories for mergenetic
Users that are interested in mergenetic are comparing it to the libraries listed below
Sorting:
- nanoGPT-like codebase for LLM training☆108Updated 4 months ago
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"☆84Updated 11 months ago
- PyTorch library for Active Fine-Tuning☆93Updated 2 weeks ago
- One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation☆43Updated 11 months ago
- [ICLR 2025] Monet: Mixture of Monosemantic Experts for Transformers☆73Updated 3 months ago
- ☆34Updated 10 months ago
- [NeurIPS 2024] Goldfish Loss: Mitigating Memorization in Generative LLMs☆92Updated 10 months ago
- ☆80Updated this week
- State-of-the-art paired encoder and decoder models (17M-1B params)☆50Updated 2 months ago
- Gemstones: A Model Suite for Multi-Faceted Scaling Laws (NeurIPS 2025)☆29Updated last week
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆60Updated last year
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methods☆136Updated 3 months ago
- A toolkit implementing advanced methods to transfer models and model knowledge across tokenizers.☆46Updated 3 months ago
- ☆51Updated 6 months ago
- Sparse Autoencoder Training Library☆54Updated 5 months ago
- Official codebase for "Quantile Reward Policy Optimization: Alignment with Pointwise Regression and Exact Partition Functions" (Matrenok …☆27Updated 2 months ago
- Official implementation of "BERTs are Generative In-Context Learners"☆32Updated 6 months ago
- Code for reproducing our paper "Not All Language Model Features Are Linear"☆81Updated 10 months ago
- Open source replication of Anthropic's Crosscoders for Model Diffing☆59Updated 11 months ago
- $100K or 100 Days: Trade-offs when Pre-Training with Academic Resources☆147Updated last week
- Code for Zero-Shot Tokenizer Transfer☆138Updated 8 months ago
- ☆142Updated last month
- Minimum Description Length probing for neural network representations☆20Updated 8 months ago
- Steering vectors for transformer language models in Pytorch / Huggingface☆125Updated 7 months ago
- Attribution-based Parameter Decomposition☆31Updated 4 months ago
- https://footprints.baulab.info☆17Updated last year
- ☆57Updated last week
- Universal Neurons in GPT2 Language Models☆30Updated last year
- Landing repository for the paper "Softpick: No Attention Sink, No Massive Activations with Rectified Softmax"☆85Updated last month
- Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Fl…☆75Updated last year