Official repository for "BLEUBERI: BLEU is a surprisingly effective reward for instruction following"
โ31Jun 5, 2025Updated 9 months ago
Alternatives and similar repositories for BLEUBERI
Users that are interested in BLEUBERI are comparing it to the libraries listed below
Sorting:
- Implementation for the paper "Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite Learning"โ11Jan 10, 2025Updated last year
- [๐๐๐๐๐ ๐ ๐ข๐ง๐๐ข๐ง๐ ๐ฌ ๐๐๐๐ & ๐๐๐ ๐๐๐๐ ๐๐๐๐๐ ๐๐ซ๐๐ฅ] ๐๐ฏ๐ฉ๐ข๐ฏ๐ค๐ช๐ฏ๐จ ๐๐ข๐ต๐ฉ๐ฆ๐ฎ๐ข๐ต๐ช๐ค๐ข๐ญ ๐๐ฆ๐ข๐ด๐ฐ๐ฏ๐ช๐ฏโฆโ51May 4, 2024Updated last year
- โ21Jul 21, 2025Updated 7 months ago
- IntructIR, a novel benchmark specifically designed to evaluate the instruction following ability in information retrieval models. Our focโฆโ32Jun 13, 2024Updated last year
- Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHFโ24Oct 8, 2024Updated last year
- An Empirical Study On Contrastive Search And Contrastive Decoding For Open-ended Text Generationโ27Jun 7, 2024Updated last year
- [ICLR 2026] PSFT is a trust-regionโinspired fine-tuning objective that views SFT as a policy gradient method with constant advantages, coโฆโ35Sep 9, 2025Updated 5 months ago
- The repository contains code for Adaptive Data Optimizationโ32Dec 9, 2024Updated last year
- [EMNLP 2025] Verification Engineering for RL in Instruction Followingโ51Jan 5, 2026Updated 2 months ago
- โ31Sep 23, 2024Updated last year
- Gemstones: A Model Suite for Multi-Faceted Scaling Laws (NeurIPS 2025)โ33Sep 28, 2025Updated 5 months ago
- The official repo for "VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search" [EMNLP25]โ38Feb 1, 2026Updated last month
- rl from zero pretrain, can it be done? yes.โ287Sep 28, 2025Updated 5 months ago
- Code for Copula conformal prediction paper (ICLR 2024)โ31Sep 26, 2024Updated last year
- Learning from preferences is a common paradigm for fine-tuning language models. Yet, many algorithmic design decisions come into play. Ouโฆโ32Apr 20, 2024Updated last year
- Official implementation for DenseMixer: Improving MoE Post-Training with Precise Router Gradientโ66Aug 3, 2025Updated 7 months ago
- The official repository of "Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint"โ39Jan 12, 2024Updated 2 years ago
- โ36Sep 6, 2024Updated last year
- Automated Continuous Data Quality Measurementโ12Nov 15, 2023Updated 2 years ago
- About Code release for "Imagination Mechanism: Mesh Information Propagation for Enhancing Data Efficiency in Reinforcement Learning"โ13Oct 7, 2023Updated 2 years ago
- DreamSmooth: Improving Model-Based RL with Reward Smoothing (ICLR 2024)โ12May 6, 2024Updated last year
- The official repo for "CodeScaler: Scaling Code LLM Training and Test-Time Inference via Execution-Free Reward Models"โ29Feb 23, 2026Updated last week
- LLM Skirmishโ44Feb 3, 2026Updated last month
- Generate Quiz Question from PDF/Text filesโ11Feb 2, 2024Updated 2 years ago
- โ16Feb 22, 2025Updated last year
- code for politeโ11Feb 28, 2024Updated 2 years ago
- โ11Jan 11, 2022Updated 4 years ago
- โ11Jun 4, 2023Updated 2 years ago
- Sample notebooks for Junoโ11Mar 1, 2025Updated last year
- A collection of heat engines, based on the OpenAI Gym environment framework for use with reinforcement learning applications.โ15Dec 20, 2021Updated 4 years ago
- Optimizing Anytime Reasoning via Budget Relative Policy Optimizationโ52Jul 15, 2025Updated 7 months ago
- โ14Mar 21, 2024Updated last year
- Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"โ186May 25, 2025Updated 9 months ago
- Self-Alignment with Principle-Following Reward Modelsโ170Sep 18, 2025Updated 5 months ago
- A speed comparison between the GPUs offered by Google Colab vs the MacBook M1 Max 24 Core chipโ10May 25, 2023Updated 2 years ago
- Code for the paper "FinRLlama: A Solution to LLM-Engineered Signals Challenge at FinRL Contest 2024"โ13Feb 14, 2025Updated last year
- An automated multi-step research system for executing deep, comprehensive research with iterative refinement, source evaluation, and resuโฆโ14Mar 11, 2025Updated 11 months ago
- Code for reproducing our paper: LMSOC: An Approach for Socially Sensitive Pretrainingโ13Oct 22, 2021Updated 4 years ago
- Experiments Notebook of "Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism"โ14Apr 30, 2025Updated 10 months ago