hgaurav2k / JEEBenchLinks

Repository for the code and dataset for the paper: "Have LLMs Advanced enough? Towards Harder Problem Solving Benchmarks For Large Language Models"

☆39

Alternatives and similar repositories for JEEBench

Users that are interested in JEEBench are comparing it to the libraries listed below

Sorting:

ekinakyurek / google-research
Google Research
☆46Updated 3 years ago
srush / LLM-Talk
☆52Updated last year
felixzli / synthetic_pretraining
☆38Updated 3 years ago
ctlllll / understanding_llm_benchmarks
Understanding the correlation between different LLM benchmarks
☆29Updated last year
hamishivi / EasyLM
Large language models (LLMs) made easy, EasyLM is a one stop solution for pre-training, finetuning, evaluating and serving LLMs in JAX/Fl…
☆75Updated last year
WHGTyen / BIG-Bench-Mistake
A dataset of LLM-generated chain-of-thought steps annotated with mistake location.
☆83Updated last year
hadasah / btm
☆76Updated last year
facebookresearch / coder_reviewer_reranking
Official code release for the paper Coder Reviewer Reranking for Code Generation.
☆45Updated 2 years ago
google-deepmind / emergent_in_context_learning
☆85Updated last year
princeton-nlp / TransformerPrograms
[NeurIPS 2023] Learning Transformer Programs
☆162Updated last year
jxiw / BiGS
Official Repository of Pretraining Without Attention (BiGS), BiGS is the first model to achieve BERT-level transfer learning on the GLUE …
☆115Updated last year
kyegomez / Reka-Torch
Implementation of the model: "Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models" in PyTorch
☆29Updated this week
protagolabs / odyssey-math
☆83Updated 9 months ago
krandiash / quinine
A library to create and manage configuration files, especially for machine learning projects.
☆80Updated 3 years ago
allenai / SciRIFF
Dataset and evaluation suite enabling LLM instruction-following for scientific literature understanding.
☆44Updated 8 months ago
EleutherAI / semantic-memorization
☆44Updated last year
wellecks / naturalproofs
NaturalProofs: Mathematical Theorem Proving in Natural Language (NeurIPS 2021 Datasets & Benchmarks)
☆133Updated 3 years ago
allenai / fermi
☆31Updated 4 years ago
allenai / Lila
A unified benchmark for math reasoning
☆89Updated 2 years ago
nyu-mll / ILF-for-code-generation
☆80Updated 7 months ago
allenai / easy-to-hard-generalization
Code for the arXiv preprint "The Unreasonable Effectiveness of Easy Training Data"
☆48Updated last year
joyheyueya / declarative-math-word-problem
☆49Updated 2 years ago
peterbhase / SLAG-Belief-Updating
Code for paper "Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs"
☆28Updated 3 years ago
VITA-Group / ChainCoder
[ICML 2023] "Outline, Then Details: Syntactically Guided Coarse-To-Fine Code Generation", Wenqing Zheng, S P Sharan, Ajay Kumar Jaiswal, …
☆43Updated 2 years ago
neelsjain / BYOD
The Official Repository for "Bring Your Own Data! Self-Supervised Evaluation for Large Language Models"
☆107Updated 2 years ago
kaistAI / factual-knowledge-acquisition
☆23Updated 3 weeks ago
martin-wey / CodeUltraFeedback
CodeUltraFeedback: aligning large language models to coding preferences (TOSEM 2025)
☆72Updated last year
csinva / iprompt
Finding semantically meaningful and accurate prompts.
☆48Updated 2 years ago
McGill-NLP / length-generalization
Code for the paper "The Impact of Positional Encoding on Length Generalization in Transformers", NeurIPS 2023
☆137Updated last year
ylsung / vl-merging
PyTorch codes for the paper "An Empirical Study of Multimodal Model Merging"
☆37Updated 2 years ago