CASIA-LM/MoDS

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/CASIA-LM/MoDS)

CASIA-LM / MoDS

☆153

Alternatives and similar repositories for MoDS

Users that are interested in MoDS are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

IronBeliever / CaR
View on GitHub
Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimation
☆91Nov 13, 2024Updated last year
tianyi-lab / Cherry_LLM
View on GitHub
[NAACL'24] Self-data filtering of LLM instruction-tuning data using a novel perplexity-based difficulty score, without using any other mo…
☆416Jun 25, 2025Updated last year
hkust-nlp / deita
View on GitHub
Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]
☆599Dec 9, 2024Updated last year
tianyi-lab / Superfiltering
View on GitHub
[ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning
☆189Jun 25, 2025Updated last year
pldlgb / nuggets
View on GitHub
☆89Dec 29, 2023Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
princeton-nlp / LESS
View on GitHub
[ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning
☆531Oct 20, 2024Updated last year
PlusLabNLP / Active-IT
View on GitHub
Code for our EMNLP-2023 paper: "Active Instruction Tuning: Improving Cross-Task Generalization by Training on Prompt Sensitive Tasks"
☆26Nov 16, 2023Updated 2 years ago
lunyiliu / CoachLM
View on GitHub
Code and data for CoachLM, an automatic instruction revision approach LLM instruction tuning.
☆60Mar 20, 2024Updated 2 years ago
Bolin97 / awesome-instruction-selector
View on GitHub
Paper list and datasets for the paper: A Survey on Data Selection for LLM Instruction Tuning
☆48Jan 22, 2026Updated 5 months ago
tianyi-lab / Reflection_Tuning
View on GitHub
[ACL'24] Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning
☆368Sep 6, 2024Updated last year
GAIR-NLP / ReAlign
View on GitHub
Reformatted Alignment
☆111Sep 23, 2024Updated last year
USTC-StarTeam / ZIP
View on GitHub
arXiv 2024 | ZIP: entropy-law data selection for efficient LLM alignment.
☆28Jun 10, 2026Updated last month
zhang-wei-chao / DC-PDD
View on GitHub
This repository presents the original implementation of Pretraining Data Detection for Large Language Models: A Divergence-based Calibrat…
☆23May 21, 2025Updated last year
CASIA-LM / ChineseWebText
View on GitHub
☆186Nov 13, 2023Updated 2 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
meowpass / FollowComplexInstruction
View on GitHub
Official implementation of the paper "From Complex to Simple: Enhancing Multi-Constraint Complex Instruction Following Ability of Large L…
☆55Jun 24, 2024Updated 2 years ago
OFA-Sys / DiverseEvol
View on GitHub
Self-Evolved Diverse Data Sampling for Efficient Instruction Tuning
☆88Dec 14, 2023Updated 2 years ago
magpie-align / magpie
View on GitHub
[ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your efficient and high-quality synthetic data …
☆874Mar 17, 2025Updated last year
thu-coai / CritiqueLLM
View on GitHub
☆147Jul 1, 2024Updated 2 years ago
EachSheep / RAGSynth
View on GitHub
The implementation of RAGSynth: Synthetic Data for Robust and Faithful RAG Component Optimization
☆21May 26, 2025Updated last year
HypherX / Evolution-Analysis
View on GitHub
☆25Dec 13, 2024Updated last year
Blue-Raincoat / SelectIT
View on GitHub
☆24Oct 14, 2024Updated last year
gpt4life / alpagasus
View on GitHub
Unofficial implementation of AlpaGasus
☆94Sep 23, 2023Updated 2 years ago
alycialee / beyond-scale-language-data-diversity
View on GitHub
☆13Apr 5, 2026Updated 3 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
heyblackC / BetterMixture-Top1-Solution
View on GitHub
天池算法比赛《BetterMixture - 大模型数据混合挑战赛》的第一名top1解决方案
☆33Jul 7, 2024Updated 2 years ago
fanqiwan / Explore-Instruct
View on GitHub
EMNLP'2023: Explore-Instruct: Enhancing Domain-Specific Instruction Coverage through Active Exploration
☆36Mar 10, 2024Updated 2 years ago
RUCKBReasoning / CoT-based-Synthesizer
View on GitHub
Official code implementation for the ACL 2025 paper: 'CoT-based Synthesizer: Enhancing LLM Performance through Answer Synthesis'
☆32May 19, 2025Updated last year
QwenLM / AutoIF
View on GitHub
☆335Jul 25, 2024Updated last year
nlpxucan / evol-instruct
View on GitHub
☆287Apr 25, 2023Updated 3 years ago
SqueezeAILab / LLM2LLM
View on GitHub
[ACL 2024] LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement
☆196Mar 25, 2024Updated 2 years ago
OFA-Sys / InsTag
View on GitHub
InsTag: A Tool for Data Analysis in LLM Supervised Fine-tuning
☆287Aug 20, 2023Updated 2 years ago
ZigeW / data_management_LLM
View on GitHub
Collection of training data management explorations for large language models
☆342Aug 2, 2024Updated last year
locuslab / scaling_laws_data_filtering
View on GitHub
☆64Apr 9, 2024Updated 2 years ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
facebookresearch / dual-system-for-visual-language-reasoning
View on GitHub
Github repo for Peifeng's internship project
☆13Nov 7, 2023Updated 2 years ago
yangjianxin1 / LongQLoRA
View on GitHub
LongQLoRA: Extent Context Length of LLMs Efficiently
☆170Nov 12, 2023Updated 2 years ago
kevinscaria / TarGEN
View on GitHub
Targeted Data Generation with Large Language Models
☆19Jun 25, 2024Updated 2 years ago
open-compass / opencompass
View on GitHub
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, …
☆7,218Updated this week
yegcjs / mixinglaws
View on GitHub
☆113Jul 15, 2025Updated last year
mukhal / PromptRank
View on GitHub
[ACL 2023] Few-shot Reranking for Multi-hop QA via Language Model Prompting
☆27Oct 19, 2025Updated 9 months ago
tianyi-lab / Mosaic-IT
View on GitHub
[ACL'25] Mosaic-IT: Cost-Free Compositional Data Synthesis for Instruction Tuning
☆20Sep 27, 2025Updated 9 months ago