☆45Jan 21, 2026Updated last month
Alternatives and similar repositories for SWE-QA-Bench
Users that are interested in SWE-QA-Bench are comparing it to the libraries listed below
Sorting:
- SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution☆25Nov 11, 2025Updated 3 months ago
- Interface for GenAI-Arena [NeurIPS24]☆17Feb 27, 2024Updated 2 years ago
- SWE-Exp: Experience-Driven Software Issue Resolution☆35Oct 17, 2025Updated 4 months ago
- LongCodeZip: Compress Long Context for Code Language Models [ASE2025]☆141Feb 5, 2026Updated last month
- ☆48Oct 28, 2025Updated 4 months ago
- Repository of IPBench☆19Jan 4, 2026Updated 2 months ago
- [CVPR2024] Learning from Synthetic Human Group Activities☆14Feb 24, 2025Updated last year
- 本仓库为RoboMaster机甲大师比赛西安交通大学笃行战队视觉组的知识库,基于Obsidian☆15Nov 5, 2023Updated 2 years ago
- DOMAINEVAL is an auto-constructed benchmark for multi-domain code generation that consists of 2k+ subjects (i.e., description, reference …☆14Dec 12, 2024Updated last year
- GBM implementation on Legate☆14Jan 28, 2026Updated last month
- [NeurIPS 2025] Official code for "Tropical Attention: Neural Algorithmic Reasoning for Combinatorial Algorithms"☆23Oct 23, 2025Updated 4 months ago
- A framework for few-shot evaluation of autoregressive language models.☆12Jul 14, 2025Updated 7 months ago
- ☆12Jan 11, 2026Updated last month
- A Swedish Natural Language Understanding Benchmark☆11Dec 12, 2025Updated 2 months ago
- Reinforced Multi-LLM Agents training☆73Jan 18, 2026Updated last month
- [NeurIPS 2024] A comprehensive benchmark for evaluating critique ability of LLMs☆49Nov 29, 2024Updated last year
- Code for the paper "FinRLlama: A Solution to LLM-Engineered Signals Challenge at FinRL Contest 2024"☆13Feb 14, 2025Updated last year
- Dataset for AAAI paper "Natural Language Inference in Context - Investigating Contextual Reasoning over Long Texts"☆11Nov 18, 2022Updated 3 years ago
- benchmarks for evaluating MT models☆11Jun 26, 2024Updated last year
- 中文金融大模型测评基准,六大类二十五任务、等级化评价,国内模型获得A级☆10May 6, 2024Updated last year
- This is the implementation of the 4th place solution (yu4u's part) for RSNA 2024 Lumbar Spine Degenerative Classification at Kaggle.☆10Oct 11, 2024Updated last year
- Code for our project CROWN (Conversational Passage Ranking by Reasoning over Word Networks)☆10Jan 11, 2024Updated 2 years ago
- ☆15Jul 26, 2022Updated 3 years ago
- ☆12May 23, 2024Updated last year
- Mixture of Expert (MoE) techniques for enhancing LLM performance through expert-driven prompt mapping and adapter combinations.☆12Feb 11, 2024Updated 2 years ago
- Long Context Research☆29Jan 26, 2026Updated last month
- GisPy: A Tool for Measuring Gist Inference Score in Text https://aclanthology.org/2022.wnu-1.5/☆13Jul 1, 2024Updated last year
- ☆11Nov 5, 2024Updated last year
- Look Back to Reason Forward: Revisitable Memory for Long-Context LLM Agents☆23Feb 21, 2026Updated 2 weeks ago
- Fast search index for SPLADE sparse retrieval models implemented in Python using Numpy and Numba☆36Oct 16, 2025Updated 4 months ago
- Survey of available speech datasets for Polish ASR development☆17Jan 1, 2025Updated last year
- Website for release of TellMeWhy dataset for why question answering☆14Nov 11, 2022Updated 3 years ago
- ☆12Nov 5, 2024Updated last year
- Code for our paper: "Building A Coding Assistant via Retrieval-Augmented Language Models"☆10Nov 2, 2024Updated last year
- [NeurIPS 2025@FoRLM] R1-Compress: Long Chain-of-Thought Compression via Chunk Compression and Search☆17Jan 24, 2026Updated last month
- DependEval: a hierarchical benchmark for evaluating LLMs on repository-level code understanding across 8 programming languages.☆15Jul 28, 2025Updated 7 months ago
- Shaping Language Models with Cognitive Insights☆15Feb 29, 2024Updated 2 years ago
- ☆15Jan 12, 2026Updated last month
- Code and Data for GlitchBench☆13Feb 27, 2024Updated 2 years ago