CoIR-team / coirLinks

(ACL 2025 Main) A Comprehensive Benchmark for Code Information Retrieval.

☆129

Alternatives and similar repositories for coir

Users that are interested in coir are comparing it to the libraries listed below

Sorting:

IAAR-Shanghai / Grimoire
Grimoire is All You Need for Enhancing Large Language Models
☆117Updated last year
OPPO-PersonalAI / TaskCraft
A library for generating difficulty-scalable, multi-tool, and verifiable agentic tasks with execution trajectories.
☆164Updated 3 months ago
HSLiu-Initial / CtrlA
This includes the original implementation of CtrlA: Adaptive Retrieval-Augmented Generation via Inherent Control.
☆62Updated last year
GreenBitAI / green-bit-llm
A toolkit for fine-tuning, inferencing, and evaluating GreenBitAI's LLMs.
☆184Updated 2 months ago
yongchao98 / CodeSteer-v1.0
Code and dataset of CodeSteer
☆88Updated 6 months ago
zou-group / avatar
(NeurIPS 2024) AvaTaR: Optimizing LLM Agents for Tool Usage via Contrastive Reasoning
☆228Updated 4 months ago
WeixiangYAN / CodeTransOcean
[EMNLP 2023] CodeTransOcean: A Comprehensive Multilingual Benchmark for Code Translation
☆57Updated last year
RLHFlow / Self-rewarding-reasoning-LLM
Recipes to train the self-rewarding reasoning LLMs.
☆226Updated 7 months ago
IAAR-Shanghai / ICSFSurvey
Explore concepts like Self-Correct, Self-Refine, Self-Improve, Self-Contradict, Self-Play, and Self-Knowledge, alongside o1-like reasonin…
☆169Updated 10 months ago
dvlab-research / MoTCoder
This is the official code repository of MoTCoder: Elevating Large Language Models with Modular of Thought for Challenging Programming Tas…
☆85Updated 6 months ago
Ablustrund / MPLSandbox
MPLSandbox is an out-of-the-box multi-programming language sandbox designed to provide unified and comprehensive feedback from compiler a…
☆177Updated 5 months ago
syr-cn / AutoRefine
[NeurIPS 2025 Poster] Search and Refine During Think: Autonomous Retrieval‑Augmented Reasoning of LLMs
☆100Updated this week
WeixiangYAN / CodeScope
[ACL 2024] CodeScope: An Execution-based Multilingual Multitask Multidimensional Benchmark for Evaluating LLMs on Code Understanding and …
☆100Updated last year
yiyihum / da-code
[EMNLP 2024] DA-Code: Agent Data Science Code Generation Benchmark for Large Language Models
☆76Updated 2 months ago
RLHFlow / Minimal-RL
☆240Updated 4 months ago
zhiyuanhubj / Meta-Ability-Alignment
Official code of paper "Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models"
☆79Updated 4 months ago
luo-junyu / RobustFT
RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response
☆42Updated 9 months ago
IAAR-Shanghai / UHGEval
[ACL 2024] User-friendly evaluation framework: Eval Suite & Benchmarks: UHGEval, HaluEval, HalluQA, etc.
☆175Updated 4 months ago
Mercury7353 / PyBench
LLM Benchmark for Code
☆31Updated last year
Wuyxin / collabllm
(ICML'25 Outstanding) CollabLLM: From Passive Responders to Active Collaborators
☆232Updated 2 weeks ago
SAILResearch / awesome-foundation-model-leaderboards
A curated list of awesome leaderboard-oriented resources for foundation models
☆284Updated this week
YangLinyi / GLUE-X
We leverage 14 datasets as OOD test data and conduct evaluations on 8 NLU tasks over 21 popularly used models. Our findings confirm that …
☆93Updated 2 years ago
facebookresearch / DocAgent
DocAgent is a system designed to generate high-quality, context-aware code documentation for Python codebases using a multi-agent approac…
☆374Updated 5 months ago
OpenDCAI / RARE
Official implementation of RARE: Retrieval-Augmented Reasoning Modeling
☆182Updated 4 months ago
allenai / genesys
Source code and utilities for the Genesys distributed language model architecture discovery system.
☆130Updated 3 months ago
wyu97 / GenRead
Code and Checkpoints for "Generate rather than Retrieve: Large Language Models are Strong Context Generators" in ICLR 2023.
☆289Updated 2 years ago
Ledzy / StreamBP
Official code of "StreamBP: Memory-Efficient Exact Backpropagation for Long Sequence Training of LLMs".
☆70Updated 3 months ago
uw-nsl / TinyV
Your efficient and accurate answer verification system for RL training.
☆40Updated 3 months ago
Yueeeeeeee / HRPO
[NeurIPS 2025] Hybrid Latent Reasoning via Reinforcement Learning
☆153Updated 3 weeks ago
HaoAreYuDong / Large-Language-Models-for-Tabular-Data
☆52Updated 11 months ago