crazycth / WizardLearner
Pretrain、decay、SFT a CodeLLM from scratch 🧙♂️
☆35Updated 11 months ago
Alternatives and similar repositories for WizardLearner:
Users that are interested in WizardLearner are comparing it to the libraries listed below
- ☆63Updated 5 months ago
- Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement fine-tuning (RFT) of large language models (…☆71Updated this week
- Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆71Updated last week
- SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning. COLM 2024 Accepted Paper☆32Updated 11 months ago
- ☆138Updated this week
- ☆46Updated 10 months ago
- A Comprehensive Survey on Long Context Language Modeling☆138Updated last month
- code for Scaling Laws of RoPE-based Extrapolation☆73Updated last year
- ☆106Updated last year
- [ACL 2024] The official codebase for the paper "Self-Distillation Bridges Distribution Gap in Language Model Fine-tuning".☆119Updated 6 months ago
- ☆115Updated last week
- RLHF experiments on a single A100 40G GPU. Support PPO, GRPO, REINFORCE, RAFT, RLOO, ReMax, DeepSeek R1-Zero reproducing.☆57Updated 2 months ago
- A research repo for experiments about Reinforcement Finetuning☆46Updated 3 weeks ago
- [NeurIPS'24] Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models☆58Updated 4 months ago
- ☆55Updated 6 months ago
- Feeling confused about super alignment? Here is a reading list☆42Updated last year
- Reformatted Alignment☆115Updated 7 months ago
- a survey of long-context LLMs from four perspectives, architecture, infrastructure, training, and evaluation☆46Updated last month
- ☆102Updated 5 months ago
- Reproducing R1 for Code with Reliable Rewards☆188Updated 2 weeks ago
- this is an implementation for the paper Improve Mathematical Reasoning in Language Models by Automated Process Supervision from google de…☆30Updated last month
- a-m-team's exploration in large language modeling☆56Updated last week
- [ACL 2024 Demo] Official GitHub repo for UltraEval: An open source framework for evaluating foundation models.☆240Updated 6 months ago
- Official completion of “Training on the Benchmark Is Not All You Need”.☆31Updated 4 months ago
- Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models☆132Updated 10 months ago
- A visuailzation tool to make deep understaning and easier debugging for RLHF training.☆188Updated 2 months ago
- CLongEval: A Chinese Benchmark for Evaluating Long-Context Large Language Models☆40Updated last year
- ☆144Updated last month
- On Memorization of Large Language Models in Logical Reasoning☆65Updated last month
- ☆143Updated 10 months ago