54rt1n / shardmergeLinks

Using fourier interpolation to merge large language models

☆11

Alternatives and similar repositories for shardmerge

Users that are interested in shardmerge are comparing it to the libraries listed below

Sorting:

jukofyork / control-vectors
Genertaes control vectors for use with llama.cpp in GGUF format.
☆34Updated 8 months ago
QuixiAI / grokadamw
☆136Updated last year
Gryphe / MergeMonster
An unsupervised model merging algorithm for Transformers-based language models.
☆108Updated last year
arcee-ai / DAM
☆55Updated last year
euclaise / SlimTrainer
Full finetuning of large language models without large memory requirements
☆94Updated 2 months ago
tdrussell / qlora-pipe
A pipeline parallel training script for LLMs.
☆163Updated 7 months ago
thomasgauthier / LoRD
Low-Rank adapter extraction for fine-tuned transformers models
☆179Updated last year
euclaise / supertrainer2000
☆50Updated last year
kaiokendev / cutoff-len-is-context-len
Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limit
☆63Updated 2 years ago
zarakiquemparte / zaraki-tools
☆27Updated 2 years ago
official-elinas / zeus-llm-trainer
Zeus LLM Trainer is a rewrite of Stanford Alpaca aiming to be the trainer for all Large Language Models
☆70Updated 2 years ago
Hellisotherpeople / llm_steer-oobabooga
Steer LLM outputs towards a certain topic/subject and enhance response capabilities using activation engineering by adding steering vecto…
☆43Updated last year
CoffeeVampir3 / ez-trainer
Train Llama Loras Easily
☆31Updated 2 years ago
reka-ai / rekaquant
☆62Updated 4 months ago
jukofyork / transplant-vocab
Transplants vocabulary between language models, enabling the creation of draft models for speculative decoding WITHOUT retraining.
☆46Updated last month
QuixiAI / spectrum
☆138Updated 3 months ago
thooton / muse
Let's create synthetic textbooks together :)
☆75Updated last year
EduardTalianu / EntropixLab
entropix style sampling + GUI
☆27Updated last year
ChrisHayduk / qlora-multi-gpu
QLoRA with Enhanced Multi GPU Support
☆37Updated 2 years ago
cg123 / bitnet
Modeling code for a BitNet b1.58 Llama-style model.
☆25Updated last year
tensoic / Cerule
Cerule - A Tiny Mighty Vision Model
☆68Updated 3 weeks ago
Mihaiii / backtrack_sampler
An easy-to-understand framework for LLM samplers that rewind and revise generated tokens
☆146Updated 9 months ago
ElleLeonne / Lightning-ReLoRA
A public implementation of the ReLoRA pretraining method, built on Lightning-AI's Pytorch Lightning suite.
☆35Updated last year
Zyphra / Zyda_processing
☆39Updated last year
VITA-Group / Q-GaLore
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.
☆202Updated last year
BlinkDL / modded-nanogpt-rwkv
RWKV-7: Surpassing GPT
☆100Updated last year
kubernetes-bad / reward-composer
Lego for GRPO
☆30Updated 6 months ago
qwopqwop200 / gptqlora
GPTQLoRA: Efficient Finetuning of Quantized LLMs with GPTQ
☆102Updated 2 years ago
winstonsmith1897 / GTPO
☆34Updated 2 months ago
Digitous / LLM-SLERP-Merge
Spherical Merge Pytorch/HF format Language Models with minimal feature loss.
☆141Updated 2 years ago