wangclnlp / DeepSpeed-Chat-ExtensionLinks
This repo contains some extensions of deepspeed-chat for fine-tuning LLMs (SFT+RLHF).
☆19Updated 11 months ago
Alternatives and similar repositories for DeepSpeed-Chat-Extension
Users that are interested in DeepSpeed-Chat-Extension are comparing it to the libraries listed below
Sorting:
- code for ACL2024-main: BatchEval: Towards Human-like Text Evaluation☆18Updated last year
- Source code of “Reinforcement Learning with Token-level Feedback for Controllable Text Generation (NAACL 2024)☆13Updated 6 months ago
- [ICML'2024] Can AI Assistants Know What They Don't Know?☆81Updated last year
- Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".☆54Updated 6 months ago
- Public code repo for COLING 2025 paper "Aligning LLMs with Individual Preferences via Interaction"☆29Updated 2 months ago
- ☆73Updated last year
- This my attempt to create Self-Correcting-LLM based on the paper Training Language Models to Self-Correct via Reinforcement Learning by g…☆35Updated this week
- The official repository of "Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint"☆38Updated last year
- A curated list of personalized alignment resources (continually updated).☆22Updated this week
- Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" pr…☆105Updated last year
- ☆19Updated last year
- [ACL 2024] Code for the paper "ALaRM: Align Language Models via Hierarchical Rewards Modeling"☆25Updated last year
- ☆30Updated last month
- Reference implementation for Token-level Direct Preference Optimization(TDPO)☆141Updated 4 months ago
- [EMNLP 2024] ”ESC-Eval: Evaluating Emotion Support Conversations in Large Language Models“☆19Updated last year
- This is a unified platform for implementing and evaluating test-time reasoning mechanisms in Large Language Models (LLMs).☆19Updated 5 months ago
- ☆82Updated last year
- ☆46Updated 7 months ago
- Watch Every Step! LLM Agent Learning via Iterative Step-level Process Refinement (EMNLP 2024 Main Conference)☆59Updated 8 months ago
- [ICLR'25] DataGen: Unified Synthetic Dataset Generation via Large Language Models☆53Updated 3 months ago
- Code Repo for EfficientRAG: Efficient Retriever for Multi-Hop Question Answering☆49Updated 3 months ago
- [SIGIR'24] The official implementation code of MOELoRA.☆168Updated 11 months ago
- The official implementation of SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning☆15Updated last month
- A Survey on the Honesty of Large Language Models☆57Updated 6 months ago
- ☆44Updated last year
- ☆62Updated last week
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆144Updated 7 months ago
- [EMNLP 2024] Source code for the paper "Learning Planning-based Reasoning with Trajectory Collection and Process Rewards Synthesizing".☆78Updated 5 months ago
- Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning"☆125Updated last week
- Source code for our paper: "ARIA: Training Language Agents with Intention-Driven Reward Aggregation".☆18Updated last week