wangclnlp / DeepSpeed-Chat-ExtensionLinks
This repo contains some extensions of deepspeed-chat for fine-tuning LLMs (SFT+RLHF).
☆20Updated last year
Alternatives and similar repositories for DeepSpeed-Chat-Extension
Users that are interested in DeepSpeed-Chat-Extension are comparing it to the libraries listed below
Sorting:
- code for ACL2024-main: BatchEval: Towards Human-like Text Evaluation☆19Updated last year
- [ICLR 2025] Code&Data for the paper "Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization"☆13Updated last year
- Official Repo for FoodieQA paper (EMNLP 2024)☆16Updated 3 months ago
- Analyzing and Reducing Catastrophic Forgetting in Parameter Efficient Tuning☆35Updated 11 months ago
- Repo for paper "CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models".☆12Updated last year
- ☆68Updated 3 weeks ago
- This is the code of MMOA-RAG.☆78Updated 5 months ago
- [EMNLP 2024] ”ESC-Eval: Evaluating Emotion Support Conversations in Large Language Models“☆20Updated last year
- This repo is reproduction resources for linear alignment paper, still working☆16Updated last year
- Public code repo for COLING 2025 paper "Aligning LLMs with Individual Preferences via Interaction"☆36Updated 6 months ago
- Source code of “Reinforcement Learning with Token-level Feedback for Controllable Text Generation (NAACL 2024)☆14Updated 10 months ago
- Codebase for Math Neurosurgery: Isolating LLMs' Math Reasoning Abilities Using Only Forward Passes☆19Updated 4 months ago
- [ICLR'25] DataGen: Unified Synthetic Dataset Generation via Large Language Models☆64Updated 7 months ago
- [2025-TMLR] A Survey on the Honesty of Large Language Models☆59Updated 10 months ago
- ☆30Updated 7 months ago
- [ICLR 2025] Language Imbalance Driven Rewarding for Multilingual Self-improving☆22Updated last month
- Code for ACL 2024 accepted paper titled "SAPT: A Shared Attention Framework for Parameter-Efficient Continual Learning of Large Language …☆36Updated 9 months ago
- This my attempt to create Self-Correcting-LLM based on the paper Training Language Models to Self-Correct via Reinforcement Learning by g…☆37Updated 3 months ago
- Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning"☆137Updated 3 months ago
- [ACL2024] A Codebase for Incremental Learning with Large Language Models; Official released code for "Learn or Recall? Revisiting Increme…☆52Updated 8 months ago
- Reinforced Multi-LLM Agents training☆51Updated 4 months ago
- ☆17Updated 9 months ago
- Source code of paper: A Stronger Mixture of Low-Rank Experts for Fine-Tuning Foundation Models. (ICML 2025)☆33Updated 6 months ago
- code for paper Query-Dependent Prompt Evaluation and Optimization with Offline Inverse Reinforcement Learning☆42Updated last year
- Official code for paper "SPA-RL: Reinforcing LLM Agent via Stepwise Progress Attribution"☆45Updated last month
- A curated list of personalized alignment resources (continually updated).☆42Updated 2 months ago
- A method of ensemble learning for heterogeneous large language models.☆61Updated last year
- Reference implementation for Token-level Direct Preference Optimization(TDPO)☆148Updated 8 months ago
- Adapt an LLM model to a Mixture-of-Experts model using Parameter Efficient finetuning (LoRA), injecting the LoRAs in the FFN.☆61Updated last year
- [ACL 2023] Modeling What-to-ask and How-to-ask for Answer-unaware Conversational Question Generation☆14Updated 2 years ago