sleepingcat4 / SophiaLinks

replacement of AdamW and Lion optimizer for LLMs

☆13

Alternatives and similar repositories for Sophia

Users that are interested in Sophia are comparing it to the libraries listed below

Sorting:

the-crypt-keeper / the-muse
Experimental sampler to make LLMs more creative
☆31Updated last year
eugenepentland / landmark-attention-qlora
Landmark Attention: Random-Access Infinite Context Length for Transformers QLoRA
☆123Updated 2 years ago
VatsaDev / NanoPhi-alpha
GPT-2 small trained on phi-like data
☆66Updated last year
mayank31398 / GPTQ-for-SantaCoder
4 bits quantization of SantaCoder using GPTQ
☆51Updated 2 years ago
lachlansneff / sparsellama
☆40Updated 2 years ago
emrgnt-cmplxty / zero-shot-replication
☆74Updated last year
teknium1 / stanford_alpaca-replit
Modified Stanford-Alpaca Trainer for Training Replit's Code Model
☆41Updated 2 years ago
euclaise / SlimTrainer
Full finetuning of large language models without large memory requirements
☆94Updated last year
thooton / muse
Let's create synthetic textbooks together :)
☆75Updated last year
desik1998 / MathWithLLMs
☆49Updated last year
zarakiquemparte / zaraki-tools
☆27Updated last year
vikhyat / mixtral-inference
inference code for mixtral-8x7b-32kseqlen
☆100Updated last year
4dh / GRDN
GRDN.AI app for garden optimization
☆70Updated last year
JD-P / minihf
MiniHF is an inference, human preference data collection, and fine-tuning tool for local language models. It is intended to help the user…
☆177Updated last week
cognitivecomputations / kraken
☆66Updated last year
aspctu / alpaca-lora
Instruct-tuning LLaMA on consumer hardware
☆66Updated 2 years ago
Dhaladom / TALIS
Simple and fast server for GPTQ-quantized LLaMA inference
☆24Updated 2 years ago
Hellisotherpeople / llm_steer-oobabooga
Steer LLM outputs towards a certain topic/subject and enhance response capabilities using activation engineering by adding steering vecto…
☆43Updated last year
Digitous / ModelREVOLVER
Model REVOLVER, a human in the loop model mixing system.
☆33Updated last year
SLAM-group / newhope
☆22Updated last year
bjj / exllamav2-openai-server
An OpenAI API compatible LLM inference server based on ExLlamaV2.
☆25Updated last year
Gryphe / MergeMonster
An unsupervised model merging algorithm for Transformers-based language models.
☆105Updated last year
OpenAccess-AI-Collective / ggml-webui
Deploy your GGML models to HuggingFace Spaces with Docker and gradio
☆37Updated 2 years ago
jllllll / exllama
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
☆64Updated last year
AlpinDale / sparsegpt-for-LLaMA
Code for the paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot" with LLaMA implementation.
☆71Updated 2 years ago
geov-ai / geov
The GeoV model is a large langauge model designed by Georges Harik and uses Rotary Positional Embeddings with Relative distances (RoPER).…
☆121Updated 2 years ago
Birch-san / mpt-play
Command-line script for inferencing from models such as MPT-7B-Chat
☆101Updated 2 years ago
thomasgauthier / LoRD
Low-Rank adapter extraction for fine-tuned transformers models
☆173Updated last year
the-crypt-keeper / LLooM
Experimental LLM Inference UX to aid in creative writing
☆114Updated 7 months ago
tensoic / Cerule
Cerule - A Tiny Mighty Vision Model
☆66Updated 10 months ago