☆27Mar 13, 2024Updated 2 years ago
Alternatives and similar repositories for rlhf-length-biases
Users that are interested in rlhf-length-biases are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ACL24☆11Jun 7, 2024Updated last year
- Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning [ICML 2024]☆21May 2, 2024Updated 2 years ago
- Code for Massive-scale Decoding for Text Generation using Lattices☆44Jul 29, 2022Updated 3 years ago
- Data processing for the Collective Constitutional AI project (a collaboration between The Collective Intelligence Project & Anthropic)☆26Oct 17, 2023Updated 2 years ago
- ☆10Jun 5, 2025Updated 11 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Directional Preference Alignment☆62Sep 23, 2024Updated last year
- ☆17Feb 14, 2024Updated 2 years ago
- ☆24Mar 21, 2025Updated last year
- Code for "Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining"☆28Oct 14, 2025Updated 7 months ago
- Google 공식 Rouge Implementation을 한국어에서 사용할 수 있도록 처리☆17Jan 3, 2024Updated 2 years ago
- ☆36Oct 4, 2023Updated 2 years ago
- [EMNLP 2023] Question Answering as Programming for Solving Time-Sensitive Questions☆12Dec 18, 2023Updated 2 years ago
- ☆36Feb 20, 2025Updated last year
- Script to pre-train hugginface transformers BART with Tensorflow 2☆35Apr 13, 2023Updated 3 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- TDD-Bench-Verified is a new benchmark for generating test cases for test-driven development (TDD)☆29Apr 28, 2026Updated last month
- Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward☆33Oct 5, 2025Updated 7 months ago
- Is In-Context Learning Sufficient for Instruction Following in LLMs? [ICLR 2025]☆32Jan 23, 2025Updated last year
- A large-scale, fine-grained, diverse preference dataset (and models).☆368Dec 29, 2023Updated 2 years ago
- ☆21May 24, 2023Updated 3 years ago
- ☆26Jun 5, 2025Updated 11 months ago
- ☆19Sep 20, 2022Updated 3 years ago
- CodeUltraFeedback: aligning large language models to coding preferences (TOSEM 2025)☆74Jun 25, 2024Updated last year
- ☆30Dec 27, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- MuJoCo benchmark for Deep Reinforcement Learning as provided by Tianshou framework.☆15Jan 12, 2025Updated last year
- FeedbackQA: Improving Question Answering Post-Deployment with Interactive Feedback☆12Jul 13, 2022Updated 3 years ago
- [COLM 2025] Official code for "When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoni…☆15Oct 31, 2025Updated 6 months ago
- Code for the paper - Controlling Dialogue Generation with Semantic Exemplars (Naacl 2021) A semantic exemplar based retrieve-refine appro…☆18Mar 26, 2021Updated 5 years ago
- Low-probability Tokens Sustain Exploration in Reinforcement Learning with Verifiable Reward☆44Nov 18, 2025Updated 6 months ago
- ☆16Jul 10, 2023Updated 2 years ago
- INSCIT: Information-Seeking Conversations with Mixed-Initiative Interactions☆16Jan 21, 2025Updated last year
- huggingface에 있는 한국어 데이터 세트☆36Oct 10, 2024Updated last year
- 나무위키덤프에서 정제된 텍스트를 얻기 위한 NamuwikiExtractor☆19Feb 27, 2022Updated 4 years ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- ☆10Sep 26, 2023Updated 2 years ago
- [ACL 2022] Ditch the Gold Standard: Re-evaluating Conversational Question Answering☆44Jun 18, 2022Updated 3 years ago
- Open-source Human Feedback Library☆11Oct 25, 2023Updated 2 years ago
- Machine Generated Captions for Best Artworks☆22Sep 21, 2022Updated 3 years ago
- [EMNLP 2024 Main] Official implementation of the paper "The Accuracy Paradox in RLHF: When Better Reward Models Don't Yield Better Langua…☆13Nov 11, 2024Updated last year
- ☆10Nov 29, 2024Updated last year
- FastCuRL: Curriculum Reinforcement Learning with Stage-wise Context Scaling for Efficient LLM Reasoning (EMNLP 2025)☆59Oct 10, 2025Updated 7 months ago