bytedance / SandboxFusion
☆98Updated last month
Alternatives and similar repositories for SandboxFusion:
Users that are interested in SandboxFusion are comparing it to the libraries listed below
- A Comprehensive Benchmark for Software Development.☆88Updated 7 months ago
- Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models☆128Updated 7 months ago
- A flexible and efficient training framework for large-scale alignment tasks☆277Updated this week
- [ACL 2024 Demo] Official GitHub repo for UltraEval: An open source framework for evaluating foundation models.☆231Updated 2 months ago
- Codes for the paper "∞Bench: Extending Long Context Evaluation Beyond 100K Tokens": https://arxiv.org/abs/2402.13718☆303Updated 3 months ago
- Official repository for our paper "FullStack Bench: Evaluating LLMs as Full Stack Coders"☆58Updated last month
- Offical Repo for "Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale"☆210Updated 3 months ago
- ☆152Updated 4 months ago
- RepoQA: Evaluating Long-Context Code Understanding☆104Updated 2 months ago
- ☆50Updated last month
- A visuailzation tool to make deep understaning and easier debugging for RLHF training.☆106Updated last week
- ACL 2024 | LooGLE: Long Context Evaluation for Long-Context Language Models☆171Updated 3 months ago
- ☆247Updated 5 months ago
- NaturalCodeBench (Findings of ACL 2024)☆61Updated 3 months ago
- The repository for paper "DebugBench: "Evaluating Debugging Capability of Large Language Models".☆59Updated 6 months ago
- Official implementation of paper How to Understand Whole Repository? New SOTA on SWE-bench Lite (21.3%)☆71Updated 2 months ago
- ☆81Updated 9 months ago
- CodeRAG-Bench: Can Retrieval Augment Code Generation?☆102Updated 2 months ago
- Codev-Bench (Code Development Benchmark), a fine-grained, real-world, repository-level, and developer-centric evaluation framework. Codev…☆33Updated 2 months ago
- ✨ RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems - ICLR 2024☆141Updated 5 months ago
- [EMNLP 2024] LongAlign: A Recipe for Long Context Alignment of LLMs☆237Updated last month
- The related works and background techniques about Openai o1☆193Updated last week
- ☆302Updated 4 months ago
- Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).☆236Updated 10 months ago
- ☆87Updated last month
- [ICML 2023] Data and code release for the paper "DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation".☆231Updated 2 months ago
- CRUXEval: Code Reasoning, Understanding, and Execution Evaluation☆119Updated 3 months ago
- Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym☆209Updated last week
- ☆92Updated 9 months ago
- CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion (NeurIPS 2023)☆126Updated 5 months ago