archiki / ReCEvalLinks

Supporting code for ReCEval paper

☆31

Alternatives and similar repositories for ReCEval

Users that are interested in ReCEval are comparing it to the libraries listed below

Sorting:

wzhouad / context-faithful-llm
Code and data for paper "Context-faithful Prompting for Large Language Models".
☆41Updated 2 years ago
sail-sg / symbolic-instruction-tuning
The official repository for the paper "From Zero to Hero: Examining the Power of Symbolic Tasks in Instruction Tuning".
☆66Updated 2 years ago
nkandpa2 / long_tail_knowledge
Repo for the paper "Large Language Models Struggle to Learn Long-Tail Knowledge"
☆78Updated 2 years ago
YuxiXie / SelfEval-Guided-Decoding
☆102Updated 2 years ago
eladsegal / strategyqa
The official code of TACL 2021, "Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies".
☆81Updated 3 years ago
Zce1112zslx / IKE
☆41Updated 2 years ago
HKUNLP / icl-ceil
[ICML 2023] Code for our paper “Compositional Exemplars for In-context Learning”.
☆102Updated 2 years ago
nayeon7lee / FactualityPrompt
☆88Updated 3 years ago
csitfun / LogiCoT
the instructions and demonstrations for building a formal logical reasoning capable GLM
☆55Updated last year
sunlab-osu / Understanding-CoT
☆88Updated 2 years ago
liujch1998 / rainier
☆28Updated last year
GXimingLu / neurologic_decoding
☆83Updated 2 years ago
hkust-nlp / felm
Github repository for "FELM: Benchmarking Factuality Evaluation of Large Language Models" (NeurIPS 2023)
☆63Updated 2 years ago
allenai / DecomP
Repository for Decomposed Prompting
☆95Updated 2 years ago
allenai / natural-instructions-v1
Benchmarking Generalization to New Tasks from Natural Language Instructions
☆26Updated 4 years ago
feyzaakyurek / rl4f
Code for RL4F: Generating Natural Language Feedback with Reinforcement Learning for Repairing Model Outputs. ACL 2023.
☆64Updated last year
wenhuchen / Time-Sensitive-QA
Code and Data for NeurIPS2021 Paper "A Dataset for Answering Time-Sensitive Questions"
☆75Updated 3 years ago
GAIR-NLP / MetaCritique
Evaluate the Quality of Critique
☆36Updated last year
salesforce / factualNLG
Code for the arXiv paper: "LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond"
☆61Updated 11 months ago
Nanami18 / Snowballed_Hallucination
☆44Updated last year
xlang-ai / icl-selective-annotation
[ICLR 2023] Code for our paper "Selective Annotation Makes Language Models Better Few-Shot Learners"
☆109Updated 2 years ago
FranxYao / FlanT5-CoT-Specialization
Implementation of ICML 23 Paper: Specializing Smaller Language Models towards Multi-Step Reasoning.
☆132Updated 2 years ago
OhadRubin / EPR
☆64Updated 3 years ago
xu1998hz / InstructScore_SEScore3
First explanation metric (diagnostic report) for text generation evaluation
☆62Updated 10 months ago
yizhongw / llm-temporal-alignment
Methods and evaluation for aligning language models temporally
☆30Updated last year
jzbjyb / lm-calibration
☆35Updated 4 years ago
allenai / csqa2
☆36Updated last year
HKUNLP / ZeroGen
[EMNLP 2022] Code for our paper “ZeroGen: Efficient Zero-shot Learning via Dataset Generation”.
☆16Updated 3 years ago
allenai / Lila
A unified benchmark for math reasoning
☆89Updated 2 years ago
veronica320 / Faithful-COT
Code and data accompanying our paper on arXiv "Faithful Chain-of-Thought Reasoning".
☆165Updated last year