QuantaAlpha / GitTaskBenchLinks
Repo-level benchmark for real-world Code Agents: from repo understanding → env setup → incremental dev/bug-fixing → task delivery, with cost-aware α metric.
☆240Updated 2 months ago
Alternatives and similar repositories for GitTaskBench
Users that are interested in GitTaskBench are comparing it to the libraries listed below
Sorting:
- Marco Search Agent for Realistic and Challenging Agentic Search☆238Updated last month
- ☆356Updated 5 months ago
- ☆200Updated last week
- Group Expectation Policy Optimization for Heterogeneous Reinforcement Learning☆164Updated last month
- A reading list for trustworthy audio large language models.☆109Updated this week
- This repo collects research papers that use AI tools and are in the field of scientific research (including computer science, agronomy, c…☆98Updated 9 months ago
- We introduce temporal working memory (TWM), which aims to enhance the temporal modeling capabilities of Multimodal foundation models (MFM…☆312Updated 3 weeks ago
- Tokenize The Virtual Agents Onchain☆242Updated 6 months ago
- ☆175Updated 3 months ago
- A powerful multi-format file parsing, data cleaning, and AI annotation toolkit.☆142Updated last week
- The Python implementation of some deep text hashing (also called deep semantic hashing) Models☆80Updated 2 weeks ago
- Official repository of DARE: dLLM Alignment and Reinforcement Executor☆117Updated this week
- Science-Star: A Platform for Building, Extending, and Experimenting with Scientific Agents.☆738Updated 2 months ago
- ☆86Updated 9 months ago
- ☆138Updated 5 months ago
- [ACL 2025 Oral] QAEncoder: Towards Aligned Representation Learning in Question Answering Systems☆176Updated 5 months ago
- ☆70Updated 2 months ago
- ☆126Updated 2 months ago
- ☆332Updated last month
- ☆57Updated 3 weeks ago
- 超能文献|AI驱动的文档翻译与学术搜索服务。支持PDF、DOCX、PPTX等多格式文档的高质量翻译(支持11种语言),特别优化了数学公式翻译。同时提供PubMed学术文献智能搜索功能。更多访问:https://suppr.wilddata.cn☆170Updated last month
- ☆219Updated 6 months ago
- [COLM 2025] Assessing Judging Bias in Large Reasoning Models: An Empirical Study https://openreview.net/pdf?id=SlRtFwBdzP☆164Updated 2 months ago
- Dataset and evaluation code of ISDrama(ACM-MM 2025): Immersive Spatial Drama Generation through Multimodal Prompting☆236Updated 3 months ago
- React Secure State☆171Updated last month
- [BIRD-INTERACT] Re-imagines Text-to-SQL evaluation via lens of dynamic interactions.☆453Updated last month
- [AAAI 2026 Oral] Cook and Clean Together: Teaching Embodied Agents for Parallel Task Execution☆355Updated last week
- Llama from scratch in Go.☆104Updated last month
- 🔐 企业级 AI API 安全代理 - 安全访问 DeepSeek API,无需在前端暴露密钥;🔐 Enterprise-grade AI API security proxy - Securely access DeepSeek API without exposin…☆57Updated 4 months ago
- 4th Place Solution for the Kaggle Competition: LMSYS - Chatbot Arena Human Preference Predictions☆171Updated last year