Xuekai-Zhu / key-configuration-of-llmsLinks
☆24Updated last year
Alternatives and similar repositories for key-configuration-of-llms
Users that are interested in key-configuration-of-llms are comparing it to the libraries listed below
Sorting:
- ☆42Updated 3 months ago
- ☆68Updated last year
- ☆17Updated 2 years ago
- An index of algorithms for reinforcement learning from human feedback (rlhf))☆92Updated last year
- my commonly-used tools☆56Updated 6 months ago
- ☆17Updated 8 months ago
- GenRM-CoT: Data release for verification rationales☆63Updated 9 months ago
- A curated list of awesome resources dedicated to Scaling Laws for LLMs☆75Updated 2 years ago
- ☆74Updated last year
- Reference implementation for Token-level Direct Preference Optimization(TDPO)☆141Updated 5 months ago
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆123Updated 10 months ago
- This is the repository that contains the source code for the Self-Evaluation Guided MCTS for online DPO.☆318Updated 11 months ago
- Code for Paper (ReMax: A Simple, Efficient and Effective Reinforcement Learning Method for Aligning Large Language Models)☆188Updated last year
- Code for ACL2024 paper - Adversarial Preference Optimization (APO).☆55Updated last year
- [ICLR2025 Spotlight] Agent Trajectory Synthesis via Guiding Replay with Web Tutorials☆35Updated 4 months ago
- Explore what LLMs are really leanring over SFT☆28Updated last year
- [NeurIPS 2024 Oral] Aligner: Efficient Alignment by Learning to Correct☆180Updated 6 months ago
- ☆42Updated last month
- ☆20Updated 6 months ago
- Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning"☆129Updated this week
- [AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracy☆63Updated 7 months ago
- Repository for Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning☆165Updated last year
- The information of NLP PhD application in the world.☆37Updated 10 months ago
- Feeling confused about super alignment? Here is a reading list☆43Updated last year
- Official repository for ACL 2025 paper "ProcessBench: Identifying Process Errors in Mathematical Reasoning"☆165Updated 2 months ago
- Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"☆182Updated 3 months ago
- [EMNLP 2024] Source code for the paper "Learning Planning-based Reasoning with Trajectory Collection and Process Rewards Synthesizing".☆80Updated 6 months ago
- Domain-specific preference (DSP) data and customized RM fine-tuning.☆25Updated last year
- ☆278Updated 6 months ago
- A simple implementation of ReasonGenRM.☆16Updated 2 months ago