bytedance / FullStackBench
Official repository for our paper "FullStack Bench: Evaluating LLMs as Full Stack Coders"
☆70Updated 2 months ago
Alternatives and similar repositories for FullStackBench:
Users that are interested in FullStackBench are comparing it to the libraries listed below
- The repository for paper "DebugBench: "Evaluating Debugging Capability of Large Language Models".☆62Updated 7 months ago
- ☆121Updated 2 weeks ago
- A Comprehensive Benchmark for Software Development.☆93Updated 8 months ago
- Inference code of Lingma SWE-GPT☆188Updated 2 months ago
- CodeRAG-Bench: Can Retrieval Augment Code Generation?☆109Updated 3 months ago
- ☆57Updated 2 months ago
- NaturalCodeBench (Findings of ACL 2024)☆62Updated 4 months ago
- ☆48Updated last year
- Codev-Bench (Code Development Benchmark), a fine-grained, real-world, repository-level, and developer-centric evaluation framework. Codev…☆36Updated 3 months ago
- [ACL 2024 Demo] Official GitHub repo for UltraEval: An open source framework for evaluating foundation models.☆233Updated 3 months ago
- xCodeEval: A Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and Retrieval☆77Updated 5 months ago
- ☆81Updated 10 months ago
- ☆98Updated 2 months ago
- ☆43Updated 8 months ago
- InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks (ICML 2024)☆107Updated 2 months ago
- [ICLR 2025] The official implementation of paper "ToolGen: Unified Tool Retrieval and Calling via Generation"☆125Updated 2 months ago
- A visuailzation tool to make deep understaning and easier debugging for RLHF training.☆148Updated this week
- Neural Code Intelligence Survey 2024; Reading lists and resources☆241Updated last week
- Repoformer: Selective Retrieval for Repository-Level Code Completion (ICML 2024)☆51Updated 7 months ago
- ☆139Updated 7 months ago
- Official github repo for AutoDetect, an automated weakness detection framework for LLMs.☆41Updated 7 months ago
- ☆28Updated 3 months ago
- Offical Repo for "Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale"☆217Updated this week
- ☆45Updated 4 months ago
- The related works and background techniques about Openai o1☆210Updated last month
- Official implementation of paper How to Understand Whole Repository? New SOTA on SWE-bench Lite (21.3%)☆71Updated 3 months ago
- e☆22Updated last week
- ACL 2024 | LooGLE: Long Context Evaluation for Long-Context Language Models☆175Updated 4 months ago
- This is a repo for showcasing using MCTS with LLMs to solve gsm8k problems☆49Updated last month