facebookresearch/llm_souping

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/facebookresearch/llm_souping)

facebookresearch / llm_souping

Model souping for LLMs

☆73

Alternatives and similar repositories for llm_souping

Users that are interested in llm_souping are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

thunlp / NOSA
View on GitHub
The official implementation of NOSA
☆19Jun 11, 2026Updated last month
SakanaAI / repo
View on GitHub
RePo: Language Models with Context Re-Positioning
☆83Mar 30, 2026Updated 3 months ago
lqtrung1998 / mwp_cot_design
View on GitHub
☆14Oct 11, 2023Updated 2 years ago
iPieter / llmq
View on GitHub
A Scheduler for Batched LLM Inference
☆19Oct 5, 2025Updated 9 months ago
NVlabs / QeRL
View on GitHub
[ICLR 2026]QeRL enables RL for 32B LLMs on a single H100 GPU.
☆511Mar 30, 2026Updated 3 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
reka-ai / rekaquant
View on GitHub
☆63Jul 10, 2025Updated last year
multimodal-art-projection / REER_DeepWriter
View on GitHub
REverse-Engineered Reasoning for Open-Ended Generation
☆98Sep 10, 2025Updated 10 months ago
IST-DASLab / MicroAdam
View on GitHub
This repository contains code for the MicroAdam paper.
☆21Dec 14, 2024Updated last year
zoecarver / saturn-arc
View on GitHub
☆27Aug 16, 2025Updated 11 months ago
SakanaAI / DroPE
View on GitHub
Extending the Context of Pretrained LLMs by Dropping Their Positional Embedding
☆219Jan 12, 2026Updated 6 months ago
LeonLixyz / LCLM
View on GitHub
latent context language models
☆72Jun 9, 2026Updated last month
zhuchichi56 / ASFT
View on GitHub
[ICLR 2026] The official implementation of the paper “Anchored Supervised Fine-Tuning”
☆47Jun 19, 2026Updated last month
ilur98 / DGQ
View on GitHub
Official Code For Dual Grained Quantization: Efficient Fine-Grained Quantization for LLM
☆14Dec 27, 2023Updated 2 years ago
jwkirchenbauer / mtp-lm
View on GitHub
Source code to accompany research paper on training multi token prediction language models using self-distillation.
☆39Feb 21, 2026Updated 5 months ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
EvolvingLMMs-Lab / MGPO
View on GitHub
High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning
☆55Jul 23, 2025Updated last year
Aratako / Task-Vector-Merge-Optimzier
View on GitHub
☆16Apr 11, 2024Updated 2 years ago
D2I-ai / struxgpt
View on GitHub
[NeurIPS 2024] Official implementation of the paper "Enhancing LLM’s Cognition via Structurization"
☆24Aug 5, 2025Updated 11 months ago
Cornell-RelaxML / yaqa-quantization
View on GitHub
☆85Jun 20, 2025Updated last year
JindongGu / SimDis
View on GitHub
A pytorch implementation of the ICCV2021 workshop paper SimDis: Simple Distillation Baselines for Improving Small Self-supervised Models
☆14Jul 15, 2021Updated 5 years ago
ahnobari / ActivationInformedMerging
View on GitHub
Official repository for Activation-Informed Merging (AIM) of Large Language Models
☆24Feb 10, 2025Updated last year
zhaoxlpku / PromptCoT
View on GitHub
☆17Apr 10, 2025Updated last year
goombalab / Gather-and-Aggregate
View on GitHub
Experiments Notebook of "Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism"
☆16Apr 30, 2025Updated last year
wang-kee / LiNeS
View on GitHub
Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"
☆31Nov 4, 2024Updated last year
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
test-time-training / e2e
View on GitHub
Official JAX implementation of End-to-End Test-Time Training for Long Context
☆627Feb 15, 2026Updated 5 months ago
ShenzhiYang2000 / TRAPO
View on GitHub
Official Repository of "[ICLR26] TRAPO: A Semi-Supervised Reinforcement Learning Framework for Boosting LLM Reasoning"
☆25Feb 6, 2026Updated 5 months ago
tilde-research / nitrobrew-release
View on GitHub
Fused KL divergence from hidden states for knowledge distillation
☆20Apr 28, 2026Updated 3 months ago
yaof20 / DenseMixer
View on GitHub
Official implementation for DenseMixer: Improving MoE Post-Training with Precise Router Gradient
☆68Aug 3, 2025Updated 11 months ago
jaehyun513 / P2T
View on GitHub
Official implementation of Tabular Transfer Learning via Prompting LLMs (COLM 2024).
☆13Aug 6, 2024Updated last year
YuehHanChen / CoTControl
View on GitHub
[ICML 2026] An Evaluation Suite for Chain-of-Thought Controllability
☆50Mar 10, 2026Updated 4 months ago
drarijitdas / Natural-GaLore
View on GitHub
An extention to the GaLore paper, to perform Natural Gradient Descent in low rank subspace
☆19Oct 21, 2024Updated last year
sapientinc / data_io
View on GitHub
Data pipeline for HRM-Text pretraining
☆68May 21, 2026Updated 2 months ago
SalesforceAIResearch / CoDA
View on GitHub
Salesforce AI Research's open diffusion language model
☆65Jun 2, 2026Updated last month
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
facebookresearch / airs-bench
View on GitHub
AIRS-Bench: an AI Research Science benchmark for quantifying the end-to-end AI research abilities of LLM agents
☆104May 5, 2026Updated 2 months ago
CarlanLark / Lp-Reg-dev
View on GitHub
Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward
☆44Nov 18, 2025Updated 8 months ago
metterian / korean_bert_score
View on GitHub
BERT score for text generation
☆12Jan 15, 2025Updated last year
SalesforceAIResearch / LeastLoadedEP
View on GitHub
☆18Jun 2, 2026Updated last month
MadryLab / pretraining-distribution-shift-robustness
View on GitHub
☆14Mar 4, 2024Updated 2 years ago
leftthomas / ProxyAnchor
View on GitHub
A PyTorch implementation of Proxy Anchor Loss based on CVPR 2020 paper "Proxy Anchor Loss for Deep Metric Learning"
☆11Jan 16, 2021Updated 5 years ago
TomSheng21 / AdaptGuard
View on GitHub
ICCV 2023 - AdaptGuard: Defending Against Universal Attacks for Model Adaptation
☆11Dec 23, 2023Updated 2 years ago