Xuekai-Zhu / key-configuration-of-llmsLinks
☆24Updated last year
Alternatives and similar repositories for key-configuration-of-llms
Users that are interested in key-configuration-of-llms are comparing it to the libraries listed below
Sorting:
- ☆16Updated 7 months ago
- [AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracy☆61Updated 5 months ago
- ☆38Updated 2 months ago
- Code accompanying the paper "Noise Contrastive Alignment of Language Models with Explicit Rewards" (NeurIPS 2024)☆53Updated 6 months ago
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆120Updated 8 months ago
- Explore what LLMs are really leanring over SFT☆28Updated last year
- Feeling confused about super alignment? Here is a reading list☆42Updated last year
- A Framework for LLM-based Multi-Agent Reinforced Training and Inference☆89Updated last week
- The official code repository for PRMBench.☆73Updated 3 months ago
- Laser: Learn to Reason Efficiently with Adaptive Length-based Reward Shaping☆41Updated 2 weeks ago
- Reference implementation for Token-level Direct Preference Optimization(TDPO)☆140Updated 3 months ago
- [ACL'24] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization☆79Updated 9 months ago
- GenRM-CoT: Data release for verification rationales☆61Updated 7 months ago
- Code for ACL2024 paper - Adversarial Preference Optimization (APO).☆54Updated last year
- The repository of the project "Fine-tuning Large Language Models with Sequential Instructions", code base comes from open-instruct and LA…☆29Updated 6 months ago
- [ICLR2025 Spotlight] Agent Trajectory Synthesis via Guiding Replay with Web Tutorials☆32Updated 3 months ago
- The code for Consistent In-Context Editing, an approach for tuning language models through contextual distributions, overcoming the limit…☆29Updated 2 months ago
- [EMNLP 2024] Source code for the paper "Learning Planning-based Reasoning with Trajectory Collection and Process Rewards Synthesizing".☆78Updated 4 months ago
- The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.☆103Updated this week
- ☆17Updated last year
- Domain-specific preference (DSP) data and customized RM fine-tuning.☆25Updated last year
- A unified suite for generating elite reasoning problems and training high-performance LLMs, including pioneering attention-free architect…☆51Updated this week
- ☆67Updated last year
- ☆59Updated 9 months ago
- [ACL 2025, Main Conference] Intuitive Fine-Tuning: Towards Simplifying Alignment into a Single Process☆28Updated 10 months ago
- A simple implementation of ReasonGenRM.☆12Updated last month
- my commonly-used tools☆56Updated 5 months ago
- This is an official implementation of the Reward rAnked Fine-Tuning Algorithm (RAFT), also known as iterative best-of-n fine-tuning or re…☆32Updated 8 months ago
- ☆25Updated last year
- A comprehensive collection of process reward models.☆88Updated 2 weeks ago