ADaM-BJTU / W2SGLinks

The code of “Improving Weak-to-Strong Generalization with Scalable Oversight and Ensemble Learning”

☆17

Alternatives and similar repositories for W2SG

Users that are interested in W2SG are comparing it to the libraries listed below

Sorting:

GAIR-NLP / BeHonest
BeHonest: Benchmarking Honesty in Large Language Models
☆34Updated last year
FreedomIntelligence / OVM
☆70Updated last year
GAIR-NLP / ReasonEval
[AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracy
☆76Updated 2 months ago
qtli / GSM-Plus
GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.
☆64Updated last year
yuzhaouoe / SAE-based-representation-engineering
[NAACL'25 Oral] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering
☆68Updated last year
hkust-nlp / Activation_Decoding
In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)
☆62Updated last year
Re-Align / AlignTDS
Analyzing LLM Alignment via Token distribution shift
☆17Updated last year
Zhou-Zoey / RMB-Reward-Model-Benchmark
☆46Updated 9 months ago
dannyallover / overthinking_the_truth
☆29Updated last year
HillZhang1999 / ICD
Code & Data for our Paper "Alleviating Hallucinations of Large Language Models through Induced Hallucinations"
☆69Updated last year
eric-mitchell / serac
Semi-Parametric Editing with a Retrieval-Augmented Counterfactual Model
☆71Updated 3 years ago
RUCAIBox / RLMEC
The official repository of "Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint"
☆39Updated last year
RZFan525 / Awesome-ScalingLaws
A curated list of awesome resources dedicated to Scaling Laws for LLMs
☆80Updated 2 years ago
F2-Song / ICDPO
The official implementation of "ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization…
☆16Updated last year
SparkJiao / dpo-trajectory-reasoning
[EMNLP 2024] Source code for the paper "Learning Planning-based Reasoning with Trajectory Collection and Process Rewards Synthesizing".
☆83Updated 11 months ago
deeplearning-wisc / picle
Official code for ICML 2024 paper on Persona In-Context Learning (PICLe)
☆26Updated last year
yizhongw / llm-temporal-alignment
Methods and evaluation for aligning language models temporally
☆30Updated last year
princeton-nlp / benign-data-breaks-safety
☆43Updated last year
Shentao-YANG / Preference_Grounded_Guidance
Source codes for "Preference-grounded Token-level Guidance for Language Model Fine-tuning" (NeurIPS 2023).
☆17Updated 11 months ago
YJiangcm / LTE
[ACL 2024] Learning to Edit: Aligning LLMs with Knowledge Editing
☆36Updated last year
gl-ybnbxb / BoNBoN
☆18Updated last year
tatsu-lab / test_set_contamination
☆41Updated 2 years ago
GAIR-NLP / MetaCritique
Evaluate the Quality of Critique
☆36Updated last year
Reason-Wang / NAT
[NAACL 2025] The official implementation of paper "Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language M…
☆29Updated last year
GAIR-NLP / alignment-for-honesty
☆77Updated last year
BunsenFeng / AbstainQA
AbstainQA, ACL 2024
☆28Updated last year
RLHFlow / RAFT
This is an official implementation of the Reward rAnked Fine-Tuning Algorithm (RAFT), also known as iterative best-of-n fine-tuning or re…
☆38Updated last year
genrm-star / genrm-critiques
GenRM-CoT: Data release for verification rationales
☆67Updated last year
RUCAIBox / HaluEval-2.0
☆48Updated last year
WeiminXiong / IPR
Watch Every Step! LLM Agent Learning via Iterative Step-level Process Refinement (EMNLP 2024 Main Conference)
☆64Updated last year