tianjunz/HIR

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/tianjunz/HIR)

tianjunz / HIR

☆157

Alternatives and similar repositories for HIR

Users that are interested in HIR are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

haoliuhl / chain-of-hindsight
View on GitHub
Simple next-token-prediction for RLHF
☆228Sep 30, 2023Updated 2 years ago
rll-research / finetune-vs-metarl
View on GitHub
☆14May 31, 2022Updated 4 years ago
GXimingLu / Quark
View on GitHub
☆75Nov 3, 2023Updated 2 years ago
feyzaakyurek / rl4f
View on GitHub
Code for RL4F: Generating Natural Language Feedback with Reinforcement Learning for Repairing Model Outputs. ACL 2023.
☆63Nov 27, 2024Updated last year
allenai / RL4LMs
View on GitHub
A modular RL library to fine-tune language models to human preferences
☆2,393Mar 1, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
CarperAI / trlx
View on GitHub
A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)
☆4,753Jan 8, 2024Updated 2 years ago
allenai / data-efficient-finetuning
View on GitHub
Code for paper 'Data-Efficient FineTuning'
☆28May 24, 2023Updated 3 years ago
rmshin / llm-mcts
View on GitHub
☆40Jun 19, 2024Updated 2 years ago
GanjinZero / RRHF
View on GitHub
[NIPS2023] RRHF & Wombat
☆805Sep 22, 2023Updated 2 years ago
seonghyeonye / Flipped-Learning
View on GitHub
[ICLR 2023] Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners
☆117Jun 28, 2025Updated last year
ctlllll / reward_collapse
View on GitHub
☆26May 30, 2023Updated 3 years ago
allenai / feb
View on GitHub
Code associated with the paper: "Few-Shot Self-Rationalization with Natural Language Prompts"
☆12Apr 27, 2022Updated 4 years ago
google-research / FLAN
View on GitHub
☆1,565Jul 2, 2026Updated 3 weeks ago
veronica320 / Faithful-COT
View on GitHub
Code and data accompanying our paper on arXiv "Faithful Chain-of-Thought Reasoning".
☆169May 7, 2024Updated 2 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
syncdoth / Chain-of-Hindsight-PyTorch
View on GitHub
Unofficial implementation of Chain of Hindsight (https://arxiv.org/abs/2302.02676) using pytorch and huggingface Trainers.
☆11Apr 5, 2023Updated 3 years ago
mingkaid / rl-prompt
View on GitHub
Accompanying repo for the RLPrompt paper
☆366Jun 6, 2024Updated 2 years ago
IBM / Dromedary
View on GitHub
Dromedary: towards helpful, ethical and reliable LLMs.
☆1,138Sep 18, 2025Updated 10 months ago
anthropics / ConstitutionalHarmlessnessPaper
View on GitHub
☆266Dec 21, 2022Updated 3 years ago
tatsu-lab / alpaca_farm
View on GitHub
A simulation framework for RLHF and alternatives. Develop your RLHF method without collecting human data.
☆845Jul 1, 2024Updated 2 years ago
FranxYao / GPT-Bargaining
View on GitHub
Code for Arxiv 2023: Improving Language Model Negociation with Self-Play and In-Context Learning from AI Feedback
☆207May 24, 2023Updated 3 years ago
microsoft / KID
View on GitHub
Knowledge Infused Decoding
☆70Dec 31, 2023Updated 2 years ago
lucy3 / whos_filtered
View on GitHub
☆15Oct 4, 2024Updated last year
tomekkorbak / pretraining-with-human-feedback
View on GitHub
Code accompanying the paper Pretraining Language Models with Human Preferences
☆182Feb 13, 2024Updated 2 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
anthropics / hh-rlhf
View on GitHub
Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"
☆1,852Jun 17, 2025Updated last year
CarperAI / cheese
View on GitHub
Used for adaptive human in the loop evaluation of language and embedding models.
☆306Mar 1, 2023Updated 3 years ago
Dahoas / reward-modeling
View on GitHub
☆98May 30, 2023Updated 3 years ago
r-three / t-few
View on GitHub
Code for T-Few from "Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning"
☆460Sep 6, 2023Updated 2 years ago
agi-templar / Stable-Alignment
View on GitHub
Multi-agent Social Simulation + Efficient, Effective, and Stable alternative of RLHF. Code for the paper "Training Socially Aligned Langu…
☆356Jun 18, 2023Updated 3 years ago
frankxu2004 / knnlm-why
View on GitHub
Repo for ICML23 "Why do Nearest Neighbor Language Models Work?"
☆59Jan 12, 2023Updated 3 years ago
JasonMa2016 / SMODICE
View on GitHub
Official repository for paper "Versatile Offline Imitation from Observations and Examples via Regularized State-Occupancy Matching" (ICML…
☆30Jan 12, 2023Updated 3 years ago
allenai / natural-instructions
View on GitHub
Expanding natural instructions
☆1,045Dec 11, 2023Updated 2 years ago
CarperAI / InstructGPT
View on GitHub
For experiments involving instruct gpt. Currently used for documenting open research questions.
☆71Nov 8, 2022Updated 3 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
allenai / few_shot_explanations
View on GitHub
Code for NAACL 2022 paper "Reframing Human-AI Collaboration for Generating Free-Text Explanations"
☆29Apr 28, 2023Updated 3 years ago
PKU-Alignment / safe-rlhf
View on GitHub
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
☆1,610Nov 24, 2025Updated 8 months ago
allenai / FineGrainedRLHF
View on GitHub
☆283Jan 6, 2025Updated last year
facebookresearch / dpr-scale
View on GitHub
Scalable training for dense retrieval models.
☆298Jul 2, 2026Updated 3 weeks ago
Shentao-YANG / Preference_Grounded_Guidance
View on GitHub
Source codes for "Preference-grounded Token-level Guidance for Language Model Fine-tuning" (NeurIPS 2023).
☆17Jan 8, 2025Updated last year
tianjunz / TEMPERA
View on GitHub
☆46Apr 10, 2023Updated 3 years ago
ari-holtzman / newformer
View on GitHub
☆16Jul 20, 2023Updated 3 years ago