microsoft / prose-benchmarksLinks

PROSE Public Benchmark Suite

☆26

Alternatives and similar repositories for prose-benchmarks

Users that are interested in prose-benchmarks are comparing it to the libraries listed below

Sorting:

THUDM / NaturalCodeBench
NaturalCodeBench (Findings of ACL 2024)
☆65Updated 8 months ago
CodeEditorBench / CodeEditorBench
☆47Updated last year
bigcode-project / astraios
Astraios: Parameter-Efficient Instruction Tuning Code Language Models
☆58Updated last year
qishenghu / InstructCoder
InstructCoder: Instruction Tuning Large Language Models for Code Editing | Oral ACL-2024 srw
☆61Updated 8 months ago
crux-eval / eval-arena
☆26Updated last week
zorazrw / odex
[EMNLP'23] Execution-Based Evaluation for Open Domain Code Generation
☆48Updated last year
ntunlp / xCodeEval
xCodeEval: A Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and Retrieval
☆83Updated 9 months ago
amazon-science / llm-code-preference
Training and Benchmarking LLMs for Code Preference.
☆33Updated 7 months ago
nyu-mll / ILF-for-code-generation
☆76Updated 3 months ago
princeton-pli / MeCo
Code for preprint "Metadata Conditioning Accelerates Language Model Pre-training (MeCo)"
☆39Updated last month
kyegomez / Reka-Torch
Implementation of the model: "Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models" in PyTorch
☆30Updated this week
rmshin / llm-mcts
☆41Updated last year
multi-swe-bench / multi-swe-bench-env
☆1Updated 9 months ago
Ablustrund / APPS_Plus
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback
☆65Updated 10 months ago
rizwan09 / REDCODER
☆45Updated this week
frankxu2004 / knnlm-why
Repo for ICML23 "Why do Nearest Neighbor Language Models Work?"
☆58Updated 2 years ago
martin-wey / CodeUltraFeedback
CodeUltraFeedback: aligning large language models to coding preferences
☆71Updated last year
Ziems / llm-url
☆35Updated last year
SalesforceAIResearch / swecomm
☆27Updated 5 months ago
overwindows / SemanticCodeSearch
Semantic Code Search
☆35Updated 2 years ago
mahimanzum / FixEval
We introduce FixEval , a dataset for competitive programming bug fixing along with a comprehensive test suite and show the necessity of e…
☆23Updated 2 years ago
logic-star-ai / swt-bench
[NeurIPS 2024] Evaluation harness for SWT-Bench, a benchmark for evaluating LLM repository-level test-generation
☆50Updated 3 weeks ago
theblackcat102 / evol-dataset
evol augment any dataset online
☆59Updated last year
gangiswag / cornstack
☆34Updated last week
SparksofAGI / MHPP
☆31Updated last week
facebookresearch / mbr-exec
code for "Natural Language to Code Translation with Execution"
☆41Updated 2 years ago
yiqingxyq / RepoST
Code for "RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing"
☆22Updated 3 months ago
VITA-Group / ChainCoder
[ICML 2023] "Outline, Then Details: Syntactically Guided Coarse-To-Fine Code Generation", Wenqing Zheng, S P Sharan, Ajay Kumar Jaiswal, …
☆40Updated last year
nlp-uoregon / ullme
☆20Updated 2 months ago
RUCAIBox / BAMBOO
☆35Updated last year