OpenLMLab / GAOKAO-Bench-2023

GAOGAO-Bench-2023 is a supplement to the GAOKAO-Bench, a dataset to evaluate large language models.

☆18

Related projects ⓘ

Alternatives and complementary repositories for GAOKAO-Bench-2023

Zheng0428 / COIG-Kun
☆35Updated 2 months ago
KwaiKEG / CogGPT
Unleashing the Power of Cognitive Dynamics on Large Language Models
☆60Updated last month
bytarnish / AGILE
☆54Updated last month
USTC-StarTeam / ZIP
☆17Updated 4 months ago
THUDM / ChatGLM-Math
☆78Updated 7 months ago
zhaochenyang20 / Prompt2Model-Self-Guide
SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning. COLM 2024 Accepted Paper
☆28Updated 5 months ago
shuyhere / about-super-alignment
Feeling confused about super alignment? Here is a reading list
☆43Updated 10 months ago
crazycth / WizardLearner
Pretrain、decay、SFT a CodeLLM from scratch 🧙‍♂️
☆32Updated 6 months ago
KbsdJames / MATH-Minos
The implementation of paper "LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Fee…
☆33Updated 3 months ago
HqWu-HITCS / Awesome-Personalized-LLM
This repo aims to record resource of role-playing abilities in LLMs, including dataset, paper, application, etc.
☆55Updated last month
MikeGu721 / AgentGroup
☆83Updated 7 months ago
dongguanting / DPA-RAG
The code and data of DPA-RAG
☆50Updated last month
cby-pku / aligner
[NeurIPS 2024 Oral] Aligner: Efficient Alignment by Learning to Correct
☆120Updated last week
yyDing1 / ScaleQuest
We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.
☆51Updated 3 weeks ago
Junjie-Ye / ToolEyes
ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios
☆62Updated 7 months ago
Bui1dMySea / MemLong
☆78Updated 2 months ago
LightChen233 / reasoning-boundary
☆20Updated last month
llmeval / llmeval-3
中文大语言模型评测第三期
☆24Updated 5 months ago
MTU-Bench-Team / MTU-Bench
MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models
☆18Updated last month
MadeAgents / Hammer
Hammer: Robust Function-Calling for On-Device Language Models via Function Masking
☆33Updated this week
NumberChiffre / mcts-llm
☆61Updated this week
percent4 / llm_math_solver
本项目用于大模型数学解题能力方面的数据集合成，模型训练及评测，相关文章记录。
☆55Updated 2 months ago
TemporaryLoRA / Temp-LoRA
☆89Updated 7 months ago
YanqiDai / MMRole
A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents
☆31Updated this week
QwenLM / online_merging_optimizers
Implementations of online merging optimizers proposed by Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment
☆66Updated 5 months ago
open-compass / GTA
[NeurIPS 2024 D&B Track] GTA: A Benchmark for General Tool Agents
☆46Updated 2 weeks ago
ernie-research / Tool-Augmented-Reward-Model
[ICLR'24 spotlight] Tool-Augmented Reward Modeling
☆36Updated 8 months ago
OpenLMLab / scaling-rope
code for Scaling Laws of RoPE-based Extrapolation
☆70Updated last year
jiahe7ay / infini-mini-transformer
This is a personal reimplementation of Google's Infini-transformer, utilizing a small 2b model. The project includes both model and train…
☆52Updated 7 months ago
GAIR-NLP / ReAlign
Reformatted Alignment
☆112Updated last month