yiyepiaoling0715 / codellm-data-preprocess-pipelineLinks

代码大模型预训练&微调&DPO 数据处理业界处理pipeline sota

☆44

Alternatives and similar repositories for codellm-data-preprocess-pipeline

Users that are interested in codellm-data-preprocess-pipeline are comparing it to the libraries listed below

Sorting:

thu-coai / CritiqueLLM
☆147Updated last year
OpenBMB / UltraEval
[ACL 2024 Demo] Official GitHub repo for UltraEval: An open source framework for evaluating foundation models.
☆252Updated last year
FlagOpen / Infinity-Instruct
☆49Updated last year
multimodal-art-projection / Megatron-LM-NEO
☆40Updated last year
zexuanqiu / CLongEval
CLongEval: A Chinese Benchmark for Evaluating Long-Context Large Language Models
☆45Updated last year
OpenCoder-llm / opc_data_filtering
Heuristic filtering framework for RefineCode
☆81Updated 8 months ago
THUDM / LongAlign
[EMNLP 2024] LongAlign: A Recipe for Long Context Alignment of LLMs
☆256Updated 11 months ago
THUDM / ChatGLM-Math
☆83Updated last year
cavalierlulu / rag_survey
☆125Updated last year
CLUEbenchmark / SuperCLUE-Agent
SuperCLUE-Agent: 基于中文原生任务的Agent智能体核心能力测评基准
☆94Updated 2 years ago
THUDM / NaturalCodeBench
NaturalCodeBench (Findings of ACL 2024)
☆67Updated last year
thu-coai / AutoDetect
Official github repo for AutoDetect, an automated weakness detection framework for LLMs.
☆44Updated last year
RUC-GSAI / Llama-3-SynE
Llama-3-SynE: A Significantly Enhanced Version of Llama-3 with Advanced Scientific Reasoning and Chinese Language Capabilities | 继续预训练提升 …
☆34Updated 5 months ago
nick7nlp / Counting-Stars
Counting-Stars (★)
☆83Updated 5 months ago
OpenLMLab / scaling-rope
code for Scaling Laws of RoPE-based Extrapolation
☆73Updated 2 years ago
X-PLUG / WritingBench
WritingBench: A Comprehensive Benchmark for Generative Writing
☆131Updated 2 months ago
LivingFutureLab / ChineseSimpleQA
☆77Updated 9 months ago
thu-coai / ComplexBench
Benchmarking Complex Instruction-Following with Multiple Constraints Composition (NeurIPS 2024 Datasets and Benchmarks Track)
☆97Updated 9 months ago
CASIA-LM / MoDS
☆146Updated last year
mutonix / RefGPT
☆98Updated last year
crazycth / WizardLearner
Pretrain、decay、SFT a CodeLLM from scratch 🧙‍♂️
☆39Updated last year
MadeAgents / Hammer
Hammer: Robust Function-Calling for On-Device Language Models via Function Masking
☆104Updated 5 months ago
SkyworkAI / skywork-o1-prm-inference
☆65Updated 11 months ago
QwenLM / AutoIF
☆315Updated last year
OFA-Sys / InsTag
InsTag: A Tool for Data Analysis in LLM Supervised Fine-tuning
☆282Updated 2 years ago
GAIR-NLP / ProX
[ICML 2025] Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale
☆263Updated 4 months ago
OpenNLG / OpenBA
☆96Updated 2 years ago
CASIA-LM / ChineseWebText
☆180Updated 2 years ago
THUDM / LongReward
☆60Updated last year
tianyi-lab / Superfiltering
[ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning
☆182Updated 4 months ago