for-ai / m-rewardbench

Official Code for M-RᴇᴡᴀʀᴅBᴇɴᴄʜ: Evaluating Reward Models in Multilingual Settings

☆27

Alternatives and similar repositories for m-rewardbench:

Users that are interested in m-rewardbench are comparing it to the libraries listed below

mlfoundations / scaling
Language models scale reliably with over-training and on downstream tasks
☆96Updated last year
SimengSun / alpaca_farm_lora
☆22Updated last year
hamishivi / EasyLM
Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Fl…
☆72Updated 7 months ago
ekinakyurek / influence
Code for "Tracing Knowledge in Language Models Back to the Training Data"
☆37Updated 2 years ago
PrasannS / rlhf-length-biases
☆27Updated last year
mega002 / ff-layers
The accompanying code for "Transformer Feed-Forward Layers Are Key-Value Memories". Mor Geva, Roei Schuster, Jonathan Berant, and Omer Le…
☆90Updated 3 years ago
Nix07 / finetuning
This repository contains the code used for the experiments in the paper "Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity…
☆25Updated last year
nouhadziri / faith-and-fate
☆34Updated last year
yuzhaouoe / pretraining-data-packing
[ACL'24 Oral] Analysing The Impact of Sequence Composition on Language Model Pre-Training
☆20Updated 7 months ago
Zhiyuan-Zeng / EvalTree
[arXiv] EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees
☆14Updated last month
shadowkiller33 / Contrast-Instruction
☆19Updated last year
aviclu / ffn-values
☆61Updated last year
ZurichNLP / mbr
Minimum Bayes Risk Decoding for Hugging Face Transformers
☆57Updated 10 months ago
shayne-longpre / a-pretrainers-guide
☆72Updated last year
meg-tong / sycophancy-eval
datasets from the paper "Towards Understanding Sycophancy in Language Models"
☆74Updated last year
UFO-101 / auto-circuit
A library for efficient patching and automatic circuit discovery.
☆62Updated last month
hannamw / EAP-IG
☆25Updated this week
allenai / noncompliance
This repository contains data, code and models for contextual noncompliance.
☆21Updated 8 months ago
kaistAI / Janus
[NeurIPS 2024] Train LLMs with diverse system messages reflecting individualized preferences to generalize to unseen system messages
☆45Updated 4 months ago
SALT-NLP / demonstrated-feedback
☆119Updated 6 months ago
SimengSun / ChapterBreak
☆11Updated 10 months ago
evandez / REMEDI
Inspecting and Editing Knowledge Representations in Language Models
☆115Updated last year
Nanami18 / Snowballed_Hallucination
☆44Updated 7 months ago
RLHFlow / Directional-Preference-Alignment
Directional Preference Alignment
☆56Updated 6 months ago
mnoukhov / async_rlhf
Code and Configs for Asynchronous RLHF: Faster and More Efficient RL for Language Models
☆45Updated 2 weeks ago
skywalker023 / fantom
👻 Code and benchmark for our EMNLP 2023 paper - "FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions"
☆54Updated 10 months ago
sylinrl / CalibratedMath
Teaching Models to Express Their Uncertainty in Words
☆38Updated 2 years ago
KoyenaPal / future-lens
Code and Data Repo for the CoNLL Paper -- Future Lens: Anticipating Subsequent Tokens from a Single Hidden State
☆18Updated last year
hbin0701 / Self-Explore
[𝐄𝐌𝐍𝐋𝐏 𝐅𝐢𝐧𝐝𝐢𝐧𝐠𝐬 𝟐𝟎𝟐𝟒 & 𝐀𝐂𝐋 𝟐𝟎𝟐𝟒 𝐍𝐋𝐑𝐒𝐄 𝐎𝐫𝐚𝐥] 𝘌𝘯𝘩𝘢𝘯𝘤𝘪𝘯𝘨 𝘔𝘢𝘵𝘩𝘦𝘮𝘢𝘵𝘪𝘤𝘢𝘭 𝘙𝘦𝘢𝘴𝘰𝘯𝘪𝘯…
☆49Updated 11 months ago
YuxiXie / SelfEval-Guided-Decoding
☆95Updated last year