bytedance / SandboxFusion
☆121Updated 2 weeks ago
Alternatives and similar repositories for SandboxFusion:
Users that are interested in SandboxFusion are comparing it to the libraries listed below
- A flexible and efficient training framework for large-scale alignment tasks☆304Updated last week
- Codes for the paper "∞Bench: Extending Long Context Evaluation Beyond 100K Tokens": https://arxiv.org/abs/2402.13718☆307Updated 4 months ago
- [ACL 2024 Demo] Official GitHub repo for UltraEval: An open source framework for evaluating foundation models.☆233Updated 3 months ago
- A Comprehensive Benchmark for Software Development.☆93Updated 8 months ago
- The repository for paper "DebugBench: "Evaluating Debugging Capability of Large Language Models".☆62Updated 7 months ago
- Offical Repo for "Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale"☆217Updated this week
- ACL 2024 | LooGLE: Long Context Evaluation for Long-Context Language Models☆175Updated 4 months ago
- The related works and background techniques about Openai o1☆210Updated last month
- ☆153Updated 5 months ago
- Inference code of Lingma SWE-GPT☆188Updated 2 months ago
- ☆318Updated 7 months ago
- CodeRAG-Bench: Can Retrieval Augment Code Generation?☆109Updated 3 months ago
- A visuailzation tool to make deep understaning and easier debugging for RLHF training.☆148Updated this week
- ☆258Updated 6 months ago
- A series of technical report on Slow Thinking with LLM☆411Updated last week
- ✨ RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems - ICLR 2024☆144Updated 6 months ago
- Implementation of paper Data Engineering for Scaling Language Models to 128K Context☆451Updated 11 months ago
- GPT-Fathom is an open-source and reproducible LLM evaluation suite, benchmarking 10+ leading open-source and closed-source LLMs as well a…☆349Updated 10 months ago
- Official repository for our paper "FullStack Bench: Evaluating LLMs as Full Stack Coders"☆70Updated 2 months ago
- [EMNLP 2024] LongAlign: A Recipe for Long Context Alignment of LLMs☆241Updated 2 months ago
- ☆304Updated 5 months ago
- Codev-Bench (Code Development Benchmark), a fine-grained, real-world, repository-level, and developer-centric evaluation framework. Codev…☆36Updated 3 months ago
- CRUXEval: Code Reasoning, Understanding, and Execution Evaluation☆125Updated 4 months ago
- [ACL 2024] AutoAct: Automatic Agent Learning from Scratch for QA via Self-Planning☆207Updated last month
- ☆57Updated 2 months ago
- [ACL'24 Outstanding] Data and code for L-Eval, a comprehensive long context language models evaluation benchmark☆370Updated 7 months ago
- CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion (NeurIPS 2023)☆130Updated 6 months ago
- NaturalCodeBench (Findings of ACL 2024)☆62Updated 4 months ago