for-ai / parameter-efficient-moe

☆253

Alternatives and similar repositories for parameter-efficient-moe:

Users that are interested in parameter-efficient-moe are comparing it to the libraries listed below

FranxYao / Long-Context-Data-Engineering
Implementation of paper Data Engineering for Scaling Language Models to 128K Context
☆456Updated last year
neelsjain / NEFTune
Official repository of NEFTune: Noisy Embeddings Improves Instruction Finetuning
☆394Updated 10 months ago
p-lambda / dsir
DSIR large-scale data selection framework for language model training
☆244Updated last year
princeton-nlp / CEPE
[ACL 2024] Long-Context Language Modeling with Parallel Encodings
☆153Updated 9 months ago
wuhy68 / Parameter-Efficient-MoE
Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks
☆142Updated 6 months ago
HKUNLP / ChunkLlama
[ICML'24] Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"
☆398Updated 5 months ago
alon-albalak / data-selection-survey
A Survey on Data Selection for Language Models
☆220Updated 5 months ago
princeton-nlp / AutoCompressors
[EMNLP 2023] Adapting Language Models to Compress Long Contexts
☆299Updated 6 months ago
dwzhu-pku / PoSE
Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)
☆205Updated 10 months ago
microsoft / rho
Repo for Rho-1: Token-level Data Selection & Selective Pretraining of LLMs.
☆406Updated 11 months ago
sangmichaelxie / doremi
Pytorch implementation of DoReMi, a method for optimizing the data mixture weights in language modeling datasets
☆317Updated last year
IBM / SALMON
Self-Alignment with Principle-Following Reward Models
☆156Updated last year
TIGER-AI-Lab / MAmmoTH2
Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]
☆137Updated 5 months ago
OFA-Sys / gsm8k-ScRel
Codes and Data for Scaling Relationship on Learning Mathematical Reasoning with Large Language Models
☆252Updated 6 months ago
voidism / DoLa
Official implementation for the paper "DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models"
☆476Updated 2 months ago
QwenLM / ProcessBench
☆144Updated 3 months ago
hkust-nlp / deita
Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]
☆545Updated 3 months ago
tianyi-lab / Superfiltering
[ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning
☆147Updated 7 months ago
jongwooko / distillm
Official PyTorch implementation of DistiLLM: Towards Streamlined Distillation for Large Language Models (ICML 2024)
☆204Updated 3 weeks ago
GCYZSL / MoLA
☆131Updated 8 months ago
princeton-nlp / ProLong
Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"
☆171Updated last month
nightdessert / Retrieval_Head
open-source code for paper: Retrieval Head Mechanistically Explains Long-Context Factuality
☆180Updated 8 months ago
OpenBMB / InfiniteBench
Codes for the paper "∞Bench: Extending Long Context Evaluation Beyond 100K Tokens": https://arxiv.org/abs/2402.13718
☆315Updated 6 months ago
allenai / reward-bench
RewardBench: the first evaluation tool for reward models.
☆537Updated last month
prateeky2806 / ties-merging
☆172Updated last year
QwenLM / AutoIF
☆264Updated 8 months ago
THUDM / LongAlign
[EMNLP 2024] LongAlign: A Recipe for Long Context Alignment of LLMs
☆249Updated 3 months ago
astramind-ai / Mixture-of-depths
Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"
☆155Updated 9 months ago
Dereck0602 / Awesome_Test_Time_LLMs
☆89Updated 3 weeks ago
ZubinGou / math-evaluation-harness
A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. 🧮✨
☆195Updated 11 months ago