A repo for RLHF training and BoN over LLMs, with support for reward model ensembles.
β46Jan 16, 2025Updated last year
Alternatives and similar repositories for llm_optimization
Users that are interested in llm_optimization are comparing it to the libraries listed below
Sorting:
- The official implementation of InfoRM [NeurIPS 2024].β15Oct 25, 2025Updated 4 months ago
- Official Code Repository for [AutoScaleπ: Scale-Aware Data Mixing for Pre-Training LLMs] Published as a conference paper at **COLM 2025*β¦β13Aug 8, 2025Updated 6 months ago
- β15Sep 11, 2022Updated 3 years ago
- This is work done by the Oxen.ai Community, trying to reproduce the Self-Rewarding Language Model paper from MetaAI.β133Nov 16, 2024Updated last year
- β21Dec 17, 2020Updated 5 years ago
- Implicit Deep Adaptive Design (iDAD): Policy-Based Experimental Design without Likelihoodsβ22Dec 30, 2021Updated 4 years ago
- Code for "CREAM: Consistency Regularized Self-Rewarding Language Models", ICLR 2025.β28Feb 17, 2025Updated last year
- Rewarded soups official implementationβ62Sep 27, 2023Updated 2 years ago
- Critique-out-Loud Reward Modelsβ74Oct 18, 2024Updated last year
- The Multitask Long Document Benchmarkβ42Nov 2, 2022Updated 3 years ago
- Implementations of Curious Replay for model-based adaptation.β43Jul 5, 2023Updated 2 years ago
- β13Nov 5, 2024Updated last year
- Meta-Learning Acquisition Functions for Transfer Learning in Bayesian Optimizationβ42Apr 24, 2020Updated 5 years ago
- Repo of paper "Free Process Rewards without Process Labels"β169Mar 14, 2025Updated 11 months ago
- β47Mar 25, 2025Updated 11 months ago
- [ICML 2023] Meta-SAGE: Scale Meta-Learning Scheduled Adaptation with Guided Exploration for Mitigating Scale Shift on Combinatorial Optimβ¦β10Dec 19, 2023Updated 2 years ago
- SentiStorm - Real-time Twitter Sentiment Classification based on Apache Stormβ10May 22, 2018Updated 7 years ago
- Posterior Refinement Improves Sample Efficiency in Bayesian Neural Networksβ10Oct 21, 2022Updated 3 years ago
- Collection of gym environments with support for domain randomizationβ10Dec 11, 2024Updated last year
- Kaggle Competition : IEEE-CIS-Fraud-Detectionβ10Jan 18, 2020Updated 6 years ago
- a Hadoop Map Reduce application that retrieves data/articles related to sports from sources like NY Times, Commoncrawl, and Twitter and cβ¦β13Oct 3, 2019Updated 6 years ago
- Implementation of the techniques presented in "Co-occurrence Feature Learning from Skeleton Data for Action Recognition" to recognize twoβ¦β11Jul 22, 2019Updated 6 years ago
- [AAAI26] Trade-offs in Large Reasoning Models: An Empirical Analysis of Deliberative and Adaptive Reasoning over Foundational Capabilitieβ¦β10Feb 7, 2026Updated 3 weeks ago
- β10Oct 3, 2023Updated 2 years ago
- xlvector's solution of github contestβ33Aug 30, 2009Updated 16 years ago
- Code for EMNLP'24 paper - On Diversified Preferences of Large Language Model Alignmentβ16Aug 6, 2024Updated last year
- Dataset Interfaces: Diagnosing Model Failures Using Controllable Counterfactual Generationβ45Feb 27, 2023Updated 3 years ago
- NLPBench: Evaluating NLP-Related Problem-solving Ability in Large Language Modelsβ10Oct 27, 2023Updated 2 years ago
- Code for Findings of ACL 2021 paper "Addressing Inquiries about History: An Efficient and Practical Framework for Evaluating Open-domain β¦β19Dec 16, 2022Updated 3 years ago
- β11Mar 13, 2023Updated 2 years ago
- β11Updated this week
- β11Mar 13, 2023Updated 2 years ago
- Our work on Reinforcement learning that we share with the rest of the worldβ13Jan 7, 2019Updated 7 years ago
- This is AlpaGasus2-QLoRA based on LLaMA2 with AlpaGasus mechanism using QLoRA!β15Nov 22, 2023Updated 2 years ago
- Very concise example of integrated gradients (a method to reveal areas of attention in input images)β10Jun 17, 2019Updated 6 years ago
- Code related to the paper "Asynchronous Batch Bayesian Optimisation with Improved Local Penalisation"β13May 8, 2019Updated 6 years ago
- β15Dec 2, 2025Updated 2 months ago
- FeedbackQA: Improving Question Answering Post-Deployment with Interactive Feedbackβ12Jul 13, 2022Updated 3 years ago
- Learning from Indirect Observationsβ11Jul 16, 2021Updated 4 years ago