hydrallm / llama-moe-v1Links

☆95

Alternatives and similar repositories for llama-moe-v1

Users that are interested in llama-moe-v1 are comparing it to the libraries listed below

Sorting:

IBM / ModuleFormer
ModuleFormer is a MoE-based architecture that includes two different types of experts: stick-breaking attention heads and feedforward exp…
☆225Updated 2 months ago
imoneoi / multipack
Multipack distributed sampler for fast padding-free training of LLMs
☆202Updated last year
kernelmachine / cbtm
Code repository for the c-BTM paper
☆108Updated 2 years ago
jondurbin / qlora
QLoRA: Efficient Finetuning of Quantized LLMs
☆79Updated last year
AblateIt / finetune-study
Comprehensive analysis of difference in performance of QLora, Lora, and Full Finetunes.
☆83Updated 2 years ago
HazyResearch / TART
TART: A plug-and-play Transformer module for task-agnostic reasoning
☆201Updated 2 years ago
LLM360 / amber-train
Pre-training code for Amber 7B LLM
☆169Updated last year
dwzhu-pku / PoSE
Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)
☆205Updated last year
jondurbin / bagel
A bagel, with everything.
☆325Updated last year
haoliuhl / chain-of-hindsight
Simple next-token-prediction for RLHF
☆227Updated 2 years ago
epfml / landmark-attention
Landmark Attention: Random-Access Infinite Context Length for Transformers
☆427Updated last year
huggingface / datablations
Scaling Data-Constrained Language Models
☆342Updated 5 months ago
Digitous / LLM-SLERP-Merge
Spherical Merge Pytorch/HF format Language Models with minimal feature loss.
☆141Updated 2 years ago
sabetAI / BLoRA
batched loras
☆347Updated 2 years ago
akjindal53244 / Arithmo
Small and Efficient Mathematical Reasoning LLMs
☆72Updated last year
SkunkworksAI / hydra-moe
☆415Updated 2 years ago
nlpxucan / evol-instruct
☆277Updated 2 years ago
huu4ontocord / MDEL
Multi-Domain Expert Learning
☆67Updated last year
huggingface / llm-swarm
Manage scalable open LLM inference endpoints in Slurm clusters
☆277Updated last year
Gryphe / BlockMerge_Gradient
Merge Transformers language models by use of gradient parameters.
☆209Updated last year
teknium1 / LLM-Benchmark-Logs
Just a bunch of benchmark logs for different LLMs
☆119Updated last year
CarperAI / autocrit
A repository for transformer critique learning and generation
☆89Updated last year
dust-tt / llama-ssp
Experiments on speculative sampling with Llama models
☆127Updated 2 years ago
leehanchung / lora-instruct
Finetune Falcon, LLaMA, MPT, and RedPajama on consumer hardware using PEFT LoRA
☆104Updated 6 months ago
akoksal / LongForm
Reverse Instructions to generate instruction tuning data with corpus examples
☆216Updated last year
allenai / CommonGen-Eval
Evaluating LLMs with CommonGen-Lite
☆93Updated last year
abacaj / train-with-fsdp
☆94Updated 2 years ago
orhonovich / unnatural-instructions
☆180Updated 2 years ago
FastEval / FastEval
Fast & more realistic evaluation of chat language models. Includes leaderboard.
☆189Updated last year
JinjieNi / MixEval
The official evaluation suite and dynamic data release for MixEval.
☆253Updated last year