Official implementation for the paper, StackEval: Benchmarking LLMs in Coding Assistance, https://arxiv.org/abs/2412.05288
☆20Oct 30, 2024Updated last year
Alternatives and similar repositories for stack-eval
Users that are interested in stack-eval are comparing it to the libraries listed below
Sorting:
- ☆11Sep 7, 2023Updated 2 years ago
- ☆17Dec 30, 2023Updated 2 years ago
- A Manually-Annotated Code Generation Benchmark Aligned with Real-World Code Repositories☆36Sep 4, 2024Updated last year
- Reproducing R1 for Code with Reliable Rewards☆12Apr 9, 2025Updated 11 months ago
- 图神经网络在推荐系统的应用☆13Aug 26, 2021Updated 4 years ago
- Code for ICML2020 "Sequence Generation with Mixed Representations"☆12Jun 27, 2020Updated 5 years ago
- Local lightning-fast semantic code search built for agents☆39Updated this week
- ☆15Oct 4, 2024Updated last year
- ☆14Feb 18, 2025Updated last year
- Baselines for all tasks from Long Code Arena benchmarks 🏟️☆39Mar 30, 2025Updated 11 months ago
- codes of LEGNN for Semi-supervised Node Classification☆12Jun 1, 2022Updated 3 years ago
- A repository of code examples to accompany the LSU CSC7809/7700/47000 course on AI foundation models.☆13Apr 5, 2025Updated 11 months ago
- A collection of publications that works on code models but beyond focusing on the accuracies.☆13Jun 30, 2023Updated 2 years ago
- LockManager with deadlock detection for implementing 2PL☆13Mar 13, 2019Updated 7 years ago
- The Infibench variant of bigcode-evaluation-harness --- a framework for the evaluation of autoregressive code generation language models.☆14Oct 19, 2024Updated last year
- This is the github to open source benchmark AdvancedIF, see LAMA L1387358RCRO☆30Nov 26, 2025Updated 3 months ago
- ☆10Feb 17, 2020Updated 6 years ago
- 运用图卷积网络对节点分类☆11Mar 23, 2020Updated 5 years ago
- ResearcherBench: Evaluating Deep AI Research Systems on the Frontiers of Scientific Inquiry☆46Jan 5, 2026Updated 2 months ago
- An easy way to view current and overall statistics for corona virus in your terminal☆11Jun 12, 2020Updated 5 years ago
- Pytorch Implementation of LoG 22 [Oral] -- Transductive Linear Probing: A Novel Framework for Few-Shot Node Classification☆17May 31, 2023Updated 2 years ago
- The replication package of <Sentiment Analysis for Software Engineering: How Far Can Pre-trained Transformer Models Go?>. Accepted by IC…☆11Nov 29, 2023Updated 2 years ago
- Constructed a structured heterogeneous text corpus graph to transform text classification problem into a node classification problem. Cr…☆14Oct 15, 2019Updated 6 years ago
- Code for our paper "Learning to Generate Unit Tests for Automated Debugging"☆17Mar 7, 2025Updated last year
- Implementation of the Paper "Goal-Driven Explainable Clustering via Language Descriptions"☆40May 24, 2023Updated 2 years ago
- This is the code for the paper "Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation".☆37Sep 1, 2025Updated 6 months ago
- AI Scientist by Chicago Human+AI Lab☆30Mar 12, 2026Updated last week
- Code release for "TempLM: Distilling Language Models into Template-Based Generators"☆14Jul 21, 2022Updated 3 years ago
- ☆11Nov 25, 2020Updated 5 years ago
- This is a ROS catkin workspace for a robot in frc☆14Dec 16, 2020Updated 5 years ago
- A scikit-learn compliant implementation of Monroe et al.'s Fightin' Words analysis method.☆11Mar 10, 2019Updated 7 years ago
- ☆28Nov 10, 2025Updated 4 months ago
- Multilingual Code Co-Evolution Using Large Language Models☆13Dec 8, 2024Updated last year
- ☆43Jun 12, 2023Updated 2 years ago
- This repository helps you evaluate your models on the FreshStack benchmark!☆33Dec 9, 2025Updated 3 months ago
- Tool to perform paired evaluation of automatic systems☆13Oct 20, 2021Updated 4 years ago
- Code for the MTEB Arena☆24Jul 2, 2025Updated 8 months ago
- DGL implementation of GRAND(Graph Random Neural Network, NeurIPS 2020)☆18Mar 19, 2021Updated 5 years ago
- Alibaba Cloud TIANCHI NLP Competition☆14Sep 29, 2020Updated 5 years ago