MLGroupJLU / LLM-eval-surveyLinks

The official GitHub page for the survey paper "A Survey on Evaluation of Large Language Models".

☆1,569

Alternatives and similar repositories for LLM-eval-survey

Users that are interested in LLM-eval-survey are comparing it to the libraries listed below

Sorting:

tjunlp-lab / Awesome-LLMs-Evaluation-Papers
The papers are organized according to our survey: Evaluating Large Language Models: A Comprehensive Survey.
☆782Updated last year
FranxYao / chain-of-thought-hub
Benchmarking large language models' complex reasoning ability with chain-of-thought prompting
☆2,748Updated last year
GaryYufei / AlignLLMHumanSurvey
Aligning Large Language Models with Human: A Survey
☆735Updated 2 years ago
AGI-Edgerunners / LLM-Adapters
Code for our EMNLP 2023 Paper: "LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models"
☆1,201Updated last year
tatsu-lab / alpaca_eval
An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
☆1,877Updated 2 months ago
HillZhang1999 / llm-hallucination-survey
Reading list of hallucination in LLMs. Check out our new survey paper: "Siren’s Song in the AI Ocean: A Survey on Hallucination in Large …
☆1,052Updated 3 weeks ago
Timothyxxx / Chain-of-ThoughtsPapers
A trend starts from "Chain of Thought Prompting Elicits Reasoning in Large Language Models".
☆2,075Updated 2 years ago
WeOpenML / PandaLM
☆922Updated last year
zjunlp / Prompt4ReasoningPapers
[ACL 2023] Reasoning with Language Model Prompting: A Survey
☆984Updated 5 months ago
Paitesanshi / LLM-Agent-Survey
☆2,853Updated 8 months ago
amazon-science / auto-cot
Official implementation for "Automatic Chain of Thought Prompting in Large Language Models" (stay tuned & more will be updated)
☆1,956Updated last year
thunlp / ToolLearningPapers
☆908Updated last year
THUDM / AgentBench
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
☆2,864Updated last week
dqxiu / ICL_PaperList
Paper List for In-context Learning 🌷
☆867Updated last year
zjunlp / EasyEdit
[ACL 2024] An Easy-to-use Knowledge Editing Framework for LLMs.
☆2,591Updated last week
SinclairCoder / Instruction-Tuning-Papers
Reading list of Instruction-tuning. A trend starts from Natrural-Instruction (ACL 2022), FLAN (ICLR 2022) and T0 (ICLR 2022).
☆770Updated 2 years ago
ruixiangcui / AGIEval
☆765Updated last year
Tongji-KGLLM / RAG-Survey
☆2,091Updated last year
keirp / automatic_prompt_engineer
☆1,316Updated last year
onejune2018 / Awesome-LLM-Eval
Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, mainly for Evaluation on LLMs…
☆572Updated last month
RenzeLou / awesome-instruction-learning
Papers and Datasets on Instruction Tuning and Following. ✨✨✨
☆503Updated last year
bigscience-workshop / promptsource
Toolkit for creating, sharing and using natural language prompts.
☆2,953Updated 2 years ago
yaodongC / awesome-instruction-dataset
A collection of open-source dataset to train instruction-following LLMs (ChatGPT,LLaMA,Alpaca)
☆1,130Updated last year
anthropics / hh-rlhf
Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"
☆1,788Updated 4 months ago
lmmlzn / Awesome-LLMs-Datasets
Summarize existing representative LLMs text datasets.
☆1,365Updated last week
AkariAsai / self-rag
This includes the original implementation of SELF-RAG: Learning to Retrieve, Generate and Critique through self-reflection by Akari Asai,…
☆2,213Updated last year
hendrycks / test
Measuring Massive Multitask Language Understanding | ICLR 2021
☆1,505Updated 2 years ago
HqWu-HITCS / Awesome-LLM-Survey
An Awesome Collection for LLM Survey
☆377Updated 4 months ago
openai / prm800k
800,000 step-level correctness labels on LLM solutions to MATH problems
☆2,056Updated 2 years ago
zjunlp / KnowledgeEditingPapers
Must-read Papers on Knowledge Editing for Large Language Models.
☆1,180Updated 3 months ago