QuantaAlpha / GitTaskBenchLinks
Repo-level benchmark for real-world Code Agents: from repo understanding → env setup → incremental dev/bug-fixing → task delivery, with cost-aware α metric.
☆226Updated 3 weeks ago
Alternatives and similar repositories for GitTaskBench
Users that are interested in GitTaskBench are comparing it to the libraries listed below
Sorting:
- We introduce temporal working memory (TWM), which aims to enhance the temporal modeling capabilities of Multimodal foundation models (MFM…☆310Updated 8 months ago
- ☆353Updated 3 months ago
- Group Expectation Policy Optimization for Heterogeneous Reinforcement Learning☆154Updated last week
- ☆137Updated 3 months ago
- a multiscale multimodal large language models for radiology report generation (RRG) tasks☆263Updated 2 months ago
- [TKDE2025] Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL | A curated list of resources (surveys, papers, benchma…☆375Updated last week
- A real-time interactive Omni Avatar built on LiveKit, which allows you to seamlessly integrate with any open source Avatar components (re…☆185Updated this week
- Tokenize The Virtual Agents Onchain☆242Updated 4 months ago
- [COLM 2025] Assessing Judging Bias in Large Reasoning Models: An Empirical Study https://arxiv.org/abs/2504.09946☆164Updated 3 weeks ago
- ☆162Updated 6 months ago
- Joint Semantic Detection and Dissemination Control of Phishing Attacks on Social Media via LLama- Based Modeling☆401Updated this week
- An MCP service that automates data analysis through IPython sessions.☆159Updated 2 months ago
- A powerful multi-format file parsing, data cleaning, and AI annotation toolkit.☆140Updated last week
- A database operations and data analysis AI agent☆427Updated last month
- React Secure State☆171Updated 2 months ago
- 🔐 企业级 AI API 安全代理 - 安全访问 DeepSeek API,无需在前端暴露密钥;🔐 Enterprise-grade AI API security proxy - Securely access DeepSeek API without exposin…☆57Updated 2 months ago
- ☆160Updated 2 months ago
- ☆287Updated 3 months ago
- A project aims to improve LLMs' pixel reasoning ability.☆81Updated last month
- [ACL 2025 Oral] QAEncoder: Towards Aligned Representation Learning in Question Answering Systems☆175Updated 3 months ago
- F²-Gen - A open source Financial Fraud Detection Data Generator Web Application☆362Updated 2 months ago
- ☆85Updated 7 months ago
- ☆301Updated 2 weeks ago
- Revolutionizing Cancer Treatment with AI & Robotics☆65Updated 7 months ago
- A lightweight intelligent agent framework implementing the complete ReAct pattern☆172Updated 2 months ago
- ☆130Updated 4 months ago
- [ACL 2025] FinMME: Benchmark Dataset for Financial Multi-Modal Reasoning Evaluation☆55Updated 3 months ago
- (LLM) A Sparse Activation Architecture for Green Artificial Intelligence: The Energy Efficiency Optimization Language Model AliceSkyGarde…☆165Updated 3 months ago
- The 1st dynamic phishing kit dataset☆201Updated 8 months ago
- ☆218Updated 4 months ago