GanjinZero / math401-llmLinks

Source codes and datasets for How well do Large Language Models perform in Arithmetic tasks?

☆57

Alternatives and similar repositories for math401-llm

Users that are interested in math401-llm are comparing it to the libraries listed below

Sorting:

qinyiwei / InfoBench
☆57Updated last year
thu-coai / PICL
Code for ACL2023 paper: Pre-Training to Learn in Context
☆106Updated last year
Spico197 / awesome-lm-evaluation
🩺 A collection of ChatGPT evaluation reports on various bechmarks.
☆50Updated 2 years ago
FranxYao / FlanT5-CoT-Specialization
Implementation of ICML 23 Paper: Specializing Smaller Language Models towards Multi-Step Reasoning.
☆132Updated 2 years ago
csitfun / LogiQA2.0
Logiqa2.0 dataset - logical reasoning in MRC and NLI tasks
☆100Updated 2 years ago
thu-coai / ComplexBench
Benchmarking Complex Instruction-Following with Multiple Constraints Composition (NeurIPS 2024 Datasets and Benchmarks Track)
☆97Updated 9 months ago
princeton-nlp / LLMBar
[ICLR 2024] Evaluating Large Language Models at Evaluating Instruction Following
☆134Updated last year
THU-KEG / KoLA
[ICLR24] The open-source repo of THU-KEG's KoLA benchmark.
☆51Updated 2 years ago
qtli / GSM-Plus
GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.
☆63Updated last year
princeton-nlp / Collie
[ICLR 2024] COLLIE: Systematic Construction of Constrained Text Generation Tasks
☆57Updated 2 years ago
hkust-nlp / felm
Github repository for "FELM: Benchmarking Factuality Evaluation of Large Language Models" (NeurIPS 2023)
☆61Updated last year
wzhouad / context-faithful-llm
Code and data for paper "Context-faithful Prompting for Large Language Models".
☆41Updated 2 years ago
csitfun / LogiCoT
the instructions and demonstrations for building a formal logical reasoning capable GLM
☆55Updated last year
allenai / DecomP
Repository for Decomposed Prompting
☆95Updated 2 years ago
gpt4life / alpagasus
Unofficial implementation of AlpaGasus
☆93Updated 2 years ago
nayeon7lee / FactualityPrompt
☆87Updated 3 years ago
salesforce / factualNLG
Code for the arXiv paper: "LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond"
☆60Updated 10 months ago
XiangLi1999 / ContrastiveDecoding
contrastive decoding
☆204Updated 3 years ago
chaochun / nlu-asdiv-dataset
☆50Updated 2 years ago
TIGER-AI-Lab / MAmmoTH2
Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]
☆149Updated last year
orhonovich / instruction-induction
☆67Updated 3 years ago
YJiangcm / FollowBench
[ACL 2024] FollowBench: A Multi-level Fine-grained Constraints Following Benchmark for Large Language Models
☆117Updated 5 months ago
siyuyuan / coscript
Resources for our ACL 2023 paper: Distilling Script Knowledge from Large Language Models for Constrained Language Planning
☆36Updated 2 years ago
WadeYin9712 / Dynosaur
Code and data for "Dynosaur: A Dynamic Growth Paradigm for Instruction-Tuning Data Curation" (EMNLP 2023)
☆64Updated 2 years ago
i-Eval / FairEval
☆142Updated 2 years ago
Spico197 / Humpback
🐋 An unofficial implementation of Self-Alignment with Instruction Backtranslation.
☆138Updated 7 months ago
arian-askari / ChatGPT-RetrievalQA-CIKM2023
A dataset for training/evaluating Question Answering Retrieval models on ChatGPT responses with the possibility to training/evaluating on…
☆141Updated last year
AI21Labs / factor
Code and data for the FACTOR paper
☆52Updated 2 years ago
yinzhangyue / SelfAware
Do Large Language Models Know What They Don’t Know?
☆102Updated last year
YuxiXie / SelfEval-Guided-Decoding
☆103Updated 2 years ago