pjlab-sys4nlp / llama-moeLinks

⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)

☆994

Alternatives and similar repositories for llama-moe

Users that are interested in llama-moe are comparing it to the libraries listed below

Sorting:

princeton-nlp / LLM-Shearing
[ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
☆631Updated last year
THUDM / LongBench
LongBench v2 and LongBench (ACL 25'&24')
☆997Updated 9 months ago
princeton-nlp / SimPO
[NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward
☆923Updated 8 months ago
jzhang38 / EasyContext
Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.
☆747Updated last year
Strivin0311 / long-llms-learning
A repository sharing the literatures about long-context large language models, including the methodologies and the evaluation benchmarks
☆268Updated last year
yule-BUAA / MergeLM
Codebase for Merging Language Models (ICML 2024)
☆853Updated last year
feifeibear / LLMSpeculativeSampling
Fast inference from large lauguage models via speculative decoding
☆841Updated last year
horseee / LLM-Pruner
[NeurIPS 2023] LLM-Pruner: On the Structural Pruning of Large Language Models. Support Llama-3/3.1, Llama-2, LLaMA, BLOOM, Vicuna, Baich…
☆1,072Updated last year
princeton-nlp / LESS
[ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning
☆496Updated last year
OpenMOSS / CoLLiE
Collaborative Training of Large Language Models in an Efficient Way
☆414Updated last year
hkust-nlp / deita
Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]
☆572Updated 10 months ago
AGI-Edgerunners / LLM-Adapters
Code for our EMNLP 2023 Paper: "LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models"
☆1,201Updated last year
alibaba / Megatron-LLaMA
Best practice for training LLaMA models in Megatron-LM
☆659Updated last year
hao-ai-lab / LookaheadDecoding
[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
☆1,288Updated 7 months ago
magpie-align / magpie
[ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your efficient and high-quality synthetic data …
☆782Updated 7 months ago
QingruZhang / AdaLoRA
AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning (ICLR 2023).
☆353Updated 2 years ago
bojone / rerope
Rectified Rotary Position Embeddings
☆381Updated last year
HKUNLP / ChunkLlama
[ICML'24] Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"
☆440Updated last year
LLaMafia / llamafia.github
☆321Updated last year
TencentARC / LLaMA-Pro
[ACL 2024] Progressive LLaMA with Block Expansion.
☆510Updated last year
XueFuzhao / OpenMoE
A family of open-sourced Mixture-of-Experts (MoE) Large Language Models
☆1,614Updated last year
multimodal-art-projection / MAP-NEO
☆964Updated 8 months ago
neelsjain / NEFTune
Official repository of NEFTune: Noisy Embeddings Improves Instruction Finetuning
☆401Updated last year
sangmichaelxie / doremi
Pytorch implementation of DoReMi, a method for optimizing the data mixture weights in language modeling datasets
☆342Updated last year
fanqiwan / FuseAI
FuseAI Project
☆583Updated 9 months ago
sail-sg / lorahub
[COLM 2024] LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition
☆655Updated last year
open-compass / MixtralKit
A toolkit for inference and evaluation of 'mixtral-8x7b-32kseqlen' from Mistral AI
☆771Updated last year
SimpleBerry / LLaMA-O1
Large Reasoning Models
☆805Updated 10 months ago
epfLLM / Megatron-LLM
distributed trainer for LLMs
☆581Updated last year
FranxYao / Long-Context-Data-Engineering
Implementation of paper Data Engineering for Scaling Language Models to 128K Context
☆477Updated last year