A hard gym for programming
☆165Jul 7, 2024Updated last year
Alternatives and similar repositories for leetcode-hard-gym
Users that are interested in leetcode-hard-gym are comparing it to the libraries listed below
Sorting:
- [NeurIPS 2023] Reflexion: Language Agents with Verbal Reinforcement Learning☆3,083Jan 14, 2025Updated last year
- Parallel data preprocessing for NLP and ML.☆34Nov 1, 2024Updated last year
- Study of Pre-Trained Positional Embeddings☆16Nov 6, 2020Updated 5 years ago
- [ACL 2024 Findings] CriticBench: Benchmarking LLMs for Critique-Correct Reasoning☆30Mar 5, 2024Updated 2 years ago
- StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback☆74Aug 31, 2024Updated last year
- A Toolkit for Fine-Tuning Large Language Models with LoRA and DeepSpeed☆11Apr 14, 2023Updated 2 years ago
- ☆12Dec 20, 2024Updated last year
- Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions☆48Sep 13, 2025Updated 5 months ago
- ☆33Updated this week
- leetcode题解 C++高性能版 (运行时长打败95%+) VSCode+CMake+Catch2☆11Sep 7, 2025Updated 6 months ago
- jQuery, React and Streamlit applications written by LLMs☆16Dec 24, 2023Updated 2 years ago
- Knowledge Graph based Question Answering benchmark.☆10Feb 1, 2020Updated 6 years ago
- Test/benchmark of using 32-bit pointers in 64-bit code on Windows. Not an actual ABI, only inspired by Linux's x32 ABI.☆13Jun 7, 2019Updated 6 years ago
- RACE is a multi-dimensional benchmark for code generation that focuses on Readability, mAintainability, Correctness, and Efficiency.☆12Oct 12, 2024Updated last year
- [LREC-Coling 2024] PECC: Problem Extraction and Coding Challenges☆14May 30, 2024Updated last year
- The repository for paper "DebugBench: "Evaluating Debugging Capability of Large Language Models".☆85Jul 13, 2024Updated last year
- ☆12Aug 21, 2024Updated last year
- ☆13Feb 25, 2025Updated last year
- Let Models Speak Ciphers: Multiagent Debate through Embeddings☆16Feb 17, 2024Updated 2 years ago
- The official repository of the Omni-MATH benchmark.☆93Dec 22, 2024Updated last year
- Fully customizable Classifer Free Guidance for ComfyUI☆15Jul 14, 2024Updated last year
- Reproducing R1 for Code with Reliable Rewards☆12Apr 9, 2025Updated 10 months ago
- ☆19Sep 16, 2025Updated 5 months ago
- Safe Python Code Execution Environment for Language Models☆17Feb 27, 2026Updated last week
- Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024☆1,693Oct 2, 2025Updated 5 months ago
- ✨ RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems - ICLR 2024☆191Aug 16, 2024Updated last year
- This is the official code for the paper CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning (Neur…☆561Jan 21, 2025Updated last year
- Reflexion: an autonomous agent with dynamic memory and self-reflection☆389Nov 26, 2023Updated 2 years ago
- ☆21Apr 2, 2025Updated 11 months ago
- Summarize the top 30 most popular arXiv papers on Reddit, Hacker News and Hugging Face in the last 30 days. And post them to Slack, Twitt…☆24Jul 5, 2025Updated 8 months ago
- Language models scale reliably with over-training and on downstream tasks☆100Apr 2, 2024Updated last year
- ☆18Sep 7, 2023Updated 2 years ago
- Soccer toy example simulator used in Reinforcement Learning☆12Mar 11, 2018Updated 7 years ago
- ☆16Nov 7, 2020Updated 5 years ago
- ☆17Jun 14, 2023Updated 2 years ago
- Aider's refactoring benchmark exercises based on popular python repos☆80Oct 10, 2024Updated last year
- A multi-programming language benchmark for LLMs☆298Jan 28, 2026Updated last month
- NaturalCodeBench (Findings of ACL 2024)☆68Oct 14, 2024Updated last year
- CRUXEval: Code Reasoning, Understanding, and Execution Evaluation☆167Oct 11, 2024Updated last year