schinger / FullLLMLinks
Full stack LLM (Pre-training/finetuning, PPO(RLHF), Inference, Quant, etc.)
☆30Updated 9 months ago
Alternatives and similar repositories for FullLLM
Users that are interested in FullLLM are comparing it to the libraries listed below
Sorting:
- Reinforcement Learning in LLM and NLP.☆61Updated 3 months ago
- Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat☆116Updated 2 years ago
- Code for a New Loss for Mitigating the Bias of Learning Difficulties in Generative Language Models☆66Updated 9 months ago
- Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning"☆144Updated last month
- 在verl上做reward的定制开发☆135Updated 6 months ago
- Train your grpo with zero dataset and low resources, 8bit/4bit/lora/qlora supported, multi-gpu supported ...☆79Updated 7 months ago
- ☆146Updated last year
- The related works and background techniques about Openai o1☆221Updated 11 months ago
- A visuailzation tool to make deep understaning and easier debugging for RLHF training.☆272Updated 9 months ago
- a-m-team's exploration in large language modeling☆194Updated 6 months ago
- This is a repo for showcasing using MCTS with LLMs to solve gsm8k problems☆93Updated last month
- 基于DPO算法微调语言大模型,简单好上手。☆48Updated last year
- Collection of papers for scalable automated alignment.☆94Updated last year
- 怎么训练一个LLM分词器☆154Updated 2 years ago
- 使用单个24G显卡,从0开始训练LLM☆55Updated 5 months ago
- ☆78Updated this week
- ☆392Updated last month
- Efficient, Low-Resource, Distributed transformer implementation based on BMTrain☆263Updated 2 years ago
- ☆319Updated 6 months ago
- llm & rl☆258Updated last month
- ☆19Updated last year
- ☆39Updated 9 months ago
- ☆80Updated 2 weeks ago
- ☆47Updated 10 months ago
- ☆115Updated last year
- Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".☆56Updated last year
- Fantastic Data Engineering for Large Language Models☆92Updated 11 months ago
- A collection of phenomenons observed during the scaling of big foundation models, which may be developed into consensus, principles, or l…☆285Updated 2 years ago
- ☆147Updated last year
- LeetCode Training and Evaluation Dataset☆43Updated 7 months ago