ProsusAI / stack-eval
Official implementation for the paper, StackEval: Benchmarking LLMs in Coding Assistance
☆13Updated 6 months ago
Alternatives and similar repositories for stack-eval
Users that are interested in stack-eval are comparing it to the libraries listed below
Sorting:
- Code for "Learning Structural Edits via Incremental Tree Transformations" (ICLR'21)☆41Updated 3 years ago
- ☆26Updated 4 months ago
- Releasing code for "ReCode: Robustness Evaluation of Code Generation Models"☆52Updated last year
- Repo for ICML23 "Why do Nearest Neighbor Language Models Work?"☆56Updated 2 years ago
- Source Code for ACL-21 main conference paper "CoSQA: 20,000+ Web Queries for Code Search and Question Answering".☆44Updated 2 years ago
- ☆30Updated 2 months ago
- Documenting large text datasets 🖼️ 📚☆12Updated 5 months ago
- The LM Contamination Index is a manually created database of contamination evidences for LMs.☆78Updated last year
- Training and Benchmarking LLMs for Code Preference.☆33Updated 6 months ago
- PyTorch code for the RetoMaton paper: "Neuro-Symbolic Language Modeling with Automaton-augmented Retrieval" (ICML 2022)☆71Updated 2 years ago
- [ACL 2023] Code for ContraCLM: Contrastive Learning For Causal Language Model☆33Updated last year
- This repository contains data, code and models for contextual noncompliance.☆22Updated 10 months ago
- ☆43Updated 3 months ago
- Few-shot Learning with Auxiliary Data☆27Updated last year
- [NeurIPS 2024] Evaluation harness for SWT-Bench, a benchmark for evaluating LLM repository-level test-generation☆48Updated last month
- Baselines for all tasks from Long Code Arena benchmarks 🏟️☆30Updated last month
- [EACL 2024] ICE-Score: Instructing Large Language Models to Evaluate Code☆76Updated 11 months ago
- InstructCoder: Instruction Tuning Large Language Models for Code Editing | Oral ACL-2024 srw☆62Updated 7 months ago
- This repository includes a benchmark and code for the paper "Evaluating LLMs at Detecting Errors in LLM Responses".☆29Updated 9 months ago
- ☆44Updated 6 months ago
- Astraios: Parameter-Efficient Instruction Tuning Code Language Models☆57Updated last year
- VarCLR: Variable Semantic Representation Pre-training via Contrastive Learning☆38Updated 2 years ago
- A extension of Transformers library to include T5ForSequenceClassification class.☆38Updated 2 years ago
- ☆124Updated 2 years ago
- Official code of our work, AVATAR: A Parallel Corpus for Java-Python Program Translation.☆54Updated 9 months ago
- [EMNLP'23] Execution-Based Evaluation for Open Domain Code Generation☆48Updated last year
- This repo contains code for the paper "Psychologically-informed chain-of-thought prompts for metaphor understanding in large language mod…☆14Updated 2 years ago
- ☆24Updated 6 months ago
- Deep Just-In-Time Inconsistency Detection Between Comments and Source Code: Artifact☆22Updated 2 years ago
- In-context Example Selection with Influences☆15Updated 2 years ago