QuantaAlpha / GitTaskBenchLinks
Repo-level benchmark for real-world Code Agents: from repo understanding → env setup → incremental dev/bug-fixing → task delivery, with cost-aware α metric.
☆236Updated 2 months ago
Alternatives and similar repositories for GitTaskBench
Users that are interested in GitTaskBench are comparing it to the libraries listed below
Sorting:
- Marco Search Agent for Realistic and Challenging Agentic Search☆236Updated last month
- ☆356Updated 5 months ago
- We introduce temporal working memory (TWM), which aims to enhance the temporal modeling capabilities of Multimodal foundation models (MFM…☆311Updated 10 months ago
- Group Expectation Policy Optimization for Heterogeneous Reinforcement Learning☆163Updated last week
- [TKDE2025] Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL | A curated list of resources (surveys, papers, benchma…☆768Updated 2 weeks ago
- ☆86Updated 9 months ago
- A powerful multi-format file parsing, data cleaning, and AI annotation toolkit.☆141Updated 3 weeks ago
- Tokenize The Virtual Agents Onchain☆241Updated 5 months ago
- ☆138Updated 5 months ago
- The Python implementation of some deep text hashing (also called deep semantic hashing) Models☆78Updated 3 weeks ago
- a multiscale multimodal large language models for radiology report generation (RRG) tasks☆267Updated 3 months ago
- Science-Star: A Platform for Building, Extending, and Experimenting with Scientific Agents.☆737Updated last month
- This repo collects research papers that use AI tools and are in the field of scientific research (including computer science, agronomy, c…☆98Updated 8 months ago
- [COLM 2025] Assessing Judging Bias in Large Reasoning Models: An Empirical Study https://openreview.net/pdf?id=SlRtFwBdzP☆164Updated 2 months ago
- ☆174Updated 2 months ago
- A multimodal personal assistant that allows Large Language Models (LLMs) to run code locally, acting as an autonomous agent capable of co…☆206Updated 10 months ago
- A database operations and data analysis AI agent☆430Updated 2 months ago
- [ACL 2025 Oral] QAEncoder: Towards Aligned Representation Learning in Question Answering Systems☆176Updated 4 months ago
- F²-Gen - A open source Financial Fraud Detection Data Generator Web Application☆366Updated last month
- [BIRD-INTERACT] Re-imagines Text-to-SQL evaluation via lens of dynamic interactions.☆451Updated last week
- The 1st dynamic phishing kit dataset☆202Updated 9 months ago
- React Secure State☆171Updated last month
- ☆162Updated 7 months ago
- Spring项目:支持设置时间、价格、距离权重的个性化导航服务,并支持根据大量用户行驶状态更新道路情况和预计到达时间☆22Updated 7 months ago
- The code of AMoPO: Adaptive Multi-objective Preference Optimization without Rewards and References.☆46Updated 2 months ago
- Enhanced Benchmark Creation Tool: Automates dataset profiling, model benchmarking, and performance visualization for streamlined evaluati…☆110Updated 2 weeks ago
- 🧠YORO---- 更少token消耗,更高效的企业rag客服框架:重复问题,仅需rag一次!点击试用web版☆83Updated 2 weeks ago
- Fast and free zeroshot lipsync MCP server☆90Updated 6 months ago
- Launching the "Agent Creation Toolkit", providing developers with an intuitive and efficient Development Environment, supporting the rapi…☆202Updated 8 months ago
- ☆515Updated 9 months ago