THUDM / NaturalCodeBenchLinks
NaturalCodeBench (Findings of ACL 2024)
☆65Updated 7 months ago
Alternatives and similar repositories for NaturalCodeBench
Users that are interested in NaturalCodeBench are comparing it to the libraries listed below
Sorting:
- ☆46Updated last year
- StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback☆64Updated 9 months ago
- ☆40Updated 5 months ago
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆49Updated 5 months ago
- ☆47Updated 5 months ago
- Benchmarking Complex Instruction-Following with Multiple Constraints Composition (NeurIPS 2024 Datasets and Benchmarks Track)☆83Updated 3 months ago
- ☆82Updated last year
- xCodeEval: A Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and Retrieval☆82Updated 8 months ago
- Towards Systematic Measurement for Long Text Quality☆35Updated 8 months ago
- [COLING 2025] ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios☆68Updated 2 weeks ago
- InstructCoder: Instruction Tuning Large Language Models for Code Editing | Oral ACL-2024 srw☆62Updated 7 months ago
- ☆49Updated last year
- A dataset of LLM-generated chain-of-thought steps annotated with mistake location.☆81Updated 9 months ago
- Astraios: Parameter-Efficient Instruction Tuning Code Language Models☆58Updated last year
- ☆31Updated last week
- Lightweight tool to identify Data Contamination in LLMs evaluation☆51Updated last year
- Collection of papers for scalable automated alignment.☆90Updated 7 months ago
- Code and data for "MT-Eval: A Multi-Turn Capabilities Evaluation Benchmark for Large Language Models"☆41Updated 7 months ago
- [ICML 2025] Teaching Language Models to Critique via Reinforcement Learning☆97Updated 3 weeks ago
- On Memorization of Large Language Models in Logical Reasoning☆65Updated 2 months ago
- [ACL-25] We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.☆63Updated 7 months ago
- Large Language Models Meet NL2Code: A Survey☆36Updated 6 months ago
- Code and data for the paper "Can Large Language Models Understand Real-World Complex Instructions?"(AAAI2024)☆48Updated last year
- ☆63Updated 6 months ago
- A distributed, extensible, secure solution for evaluating machine generated code with unit tests in multiple programming languages.☆54Updated 7 months ago
- ☆32Updated last week
- ☆66Updated 2 months ago
- Reformatted Alignment☆113Updated 8 months ago
- Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling☆101Updated 4 months ago
- ☆142Updated 11 months ago