sun-wendy / DafnyBenchLinks

DafnyBench: A Benchmark for Formal Software Verification

☆41

Alternatives and similar repositories for DafnyBench

Users that are interested in DafnyBench are comparing it to the libraries listed below

Sorting:

Mondego / dafny-synthesis
[FSE-2024] Towards AI-Assisted Synthesis of Verified Dafny Methods
☆48Updated last year
zhaoyu-li / DL4TP
[COLM 2024] A Survey on Deep Learning for Theorem Proving
☆195Updated last month
wiio12 / LEGO-Prover
Code for the paper LEGO-Prover: Neural Theorem Proving with Growing Libraries
☆65Updated last year
xiye17 / SAT-LM
SatLM: SATisfiability-Aided Language Models using Declarative Prompting (NeurIPS 2023)
☆49Updated last year
cmu-l3 / alphaverus
AlphaVerus: Formally Verified Code Generation through Self-Improving Translation and Treefinement
☆13Updated 2 months ago
kfdong / STP
The official implementation of "Self-play LLM Theorem Provers with Iterative Conjecturing and Proving"
☆98Updated 3 months ago
Miracle-Messi / Isa-AutoFormal
☆15Updated 8 months ago
Sphere-AI-Lab / FormalMATH-Bench
☆58Updated last month
facebookresearch / cruxeval
CRUXEval: Code Reasoning, Understanding, and Execution Evaluation
☆149Updated 9 months ago
jdnklau / fm-ml
Collection of resources for research concerning Machine Learning and Formal Methods.
☆88Updated 3 years ago
yangky11 / miniF2F-lean4
☆61Updated 2 weeks ago
facebookresearch / miniF2F
An updated version of miniF2F with lots of fixes and informal statements / solutions.
☆88Updated 6 months ago
Goedel-LM / Goedel-Prover
☆185Updated 3 months ago
j991222 / ai4math-papers
AI for Mathematics (AI4Math) paper list
☆169Updated 9 months ago
trishullab / PutnamBench
An evaluation benchmark for undergraduate competition math in Lean4, Isabelle, Coq, and natural language.
☆140Updated last week
lean-dojo / ReProver
Retrieval-Augmented Theorem Provers for Lean
☆281Updated 5 months ago
ganler / code-r1
Reproducing R1 for Code with Reliable Rewards
☆237Updated 2 months ago
MoonshotAI / CombiBench
☆27Updated 3 weeks ago
thunlp / DebugBench
The repository for paper "DebugBench: "Evaluating Debugging Capability of Large Language Models".
☆78Updated last year
ChuyueSun / Clover
Clover: Closed-Loop Verifiable Code Generation
☆35Updated 2 months ago
albertqjiang / Portal-to-ISAbelle
https://albertqjiang.github.io/Portal-to-ISAbelle/
☆56Updated last year
ise-uiuc / xft
XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts
☆33Updated last year
albertqjiang / draft_sketch_prove
☆67Updated last year
ise-uiuc / blazedit
Making code edting up to 7.7x faster using multi-layer speculation
☆21Updated 4 months ago
FudanSELab / ClassEval
Benchmark ClassEval for class-level code generation.
☆144Updated 8 months ago
zkx06111 / ALGO
☆35Updated 2 years ago
loganrjmurphy / LeanEuclid
LeanEuclid is a benchmark for autoformalization in the domain of Euclidean geometry, targeting the proof assistant Lean.
☆101Updated 2 months ago
trishullab / copra
COPRA: An in-COntext PRoof Agent which uses LLMs like GPTs to prove theorems in formal languages.
☆64Updated 2 months ago
rookie-joe / PDA
☆31Updated 6 months ago
llm4code / 2024
The First International Workshop on Large Language Model for Code 2024 (Co-Located with ICSE 2024)
☆17Updated 9 months ago