Cohere-Labs-Community / parameter-efficient-moeLinks

☆272

Alternatives and similar repositories for parameter-efficient-moe

Users that are interested in parameter-efficient-moe are comparing it to the libraries listed below

Sorting:

microsoft / rho
Repo for Rho-1: Token-level Data Selection & Selective Pretraining of LLMs.
☆447Updated last year
nightdessert / Retrieval_Head
open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factuality
☆222Updated last year
p-lambda / dsir
DSIR large-scale data selection framework for language model training
☆266Updated last year
princeton-nlp / AutoCompressors
[EMNLP 2023] Adapting Language Models to Compress Long Contexts
☆318Updated last year
HKUNLP / ChunkLlama
[ICML'24] Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"
☆443Updated last year
prateeky2806 / ties-merging
☆198Updated last year
FranxYao / Long-Context-Data-Engineering
Implementation of paper Data Engineering for Scaling Language Models to 128K Context
☆478Updated last year
wuhy68 / Parameter-Efficient-MoE
Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks (EMNLP'24)
☆147Updated last year
neelsjain / NEFTune
Official repository of NEFTune: Noisy Embeddings Improves Instruction Finetuning
☆405Updated last year
voidism / DoLa
Official implementation for the paper "DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models"
☆524Updated 10 months ago
TIGER-AI-Lab / LongICLBench
Code and Data for "Long-context LLMs Struggle with Long In-context Learning" [TMLR2025]
☆110Updated 9 months ago
jongwooko / distillm
Official PyTorch implementation of DistiLLM: Towards Streamlined Distillation for Large Language Models (ICML 2024)
☆238Updated 8 months ago
da03 / Internalize_CoT_Step_by_Step
☆199Updated 7 months ago
princeton-nlp / CEPE
[ACL 2024] Long-Context Language Modeling with Parallel Encodings
☆167Updated last year
dwzhu-pku / PoSE
Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)
☆205Updated last year
astramind-ai / Mixture-of-depths
Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"
☆175Updated last year
OFA-Sys / gsm8k-ScRel
Codes and Data for Scaling Relationship on Learning Mathematical Reasoning with Large Language Models
☆267Updated last year
allenai / WildBench
Benchmarking LLMs with Challenging Tasks from Real Users
☆246Updated last year
nelson-liu / lost-in-the-middle
Code and data for "Lost in the Middle: How Language Models Use Long Contexts"
☆365Updated last year
princeton-nlp / ProLong
Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"
☆240Updated 2 months ago
sangmichaelxie / doremi
Pytorch implementation of DoReMi, a method for optimizing the data mixture weights in language modeling datasets
☆347Updated last year
OpenLMLab / LEval
[ACL'24 Outstanding] Data and code for L-Eval, a comprehensive long context language models evaluation benchmark
☆391Updated last year
tianyi-lab / Superfiltering
[ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning
☆182Updated 5 months ago
UIC-Liu-Lab / ContinualLM
An Extensible Continual Learning Framework Focused on Language Models (LMs)
☆290Updated last year
thunlp / MoEfication
☆142Updated last year
gpt4life / alpagasus
Unofficial implementation of AlpaGasus
☆93Updated 2 years ago
EricLBuehler / xlora
X-LoRA: Mixture of LoRA Experts
☆252Updated last year
yxli2123 / LoftQ
☆235Updated last year
alisawuffles / proxy-tuning
Code associated with Tuning Language Models by Proxy (Liu et al., 2024)
☆123Updated last year
kaistAI / CoT-Collection
[EMNLP 2023] The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning
☆250Updated 2 years ago