bigcode-project / pii-lib
Code for PII detection and redaction in code datasets
☆11Updated last year
Related projects: ⓘ
- Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.☆79Updated 2 weeks ago
- CRUXEval: Code Reasoning, Understanding, and Execution Evaluation☆99Updated last month
- evol augment any dataset online☆55Updated last year
- This repository contains all the code for collecting large scale amounts of code from GitHub.☆105Updated last year
- StarCoder2-Instruct: Fully Transparent and Permissive Self-Alignment for Code Generation☆221Updated 2 months ago
- A multi-programming language benchmark for LLMs☆189Updated this week
- Repository for analysis and experiments in the BigCode project.☆113Updated 6 months ago
- RepoQA: Evaluating Long-Context Code Understanding☆96Updated this week
- Harness used to benchmark aider against SWE Bench benchmarks☆44Updated 2 months ago
- Small and Efficient Mathematical Reasoning LLMs☆69Updated 7 months ago
- ☆73Updated last year
- ☆111Updated last year
- Code for the paper "Efficient Training of Language Models to Fill in the Middle"☆162Updated last year
- The code for the paper ROUTERBENCH: A Benchmark for Multi-LLM Routing System☆86Updated 3 months ago
- A repository to perform self-instruct with a model on HF Hub☆30Updated 11 months ago
- ☆86Updated last year
- Multi-Domain Expert Learning☆67Updated 7 months ago
- Just a bunch of benchmark logs for different LLMs☆112Updated last month
- [ICML 2023] Data and code release for the paper "DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation".☆211Updated last month
- ☆75Updated 3 weeks ago
- Mixing Language Models with Self-Verification and Meta-Verification☆96Updated 10 months ago
- Accepted by Transactions on Machine Learning Research (TMLR)☆115Updated 8 months ago
- ✨ RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems - ICLR 2024☆129Updated last month
- Functional Benchmarks and the Reasoning Gap☆74Updated last month
- ☆89Updated 11 months ago
- ☆251Updated last year
- Steer LLM outputs towards a certain topic/subject and enhance response capabilities using activation engineering by adding steering vecto…☆192Updated 4 months ago
- Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents☆102Updated 3 months ago
- Official repository for the paper "LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code"☆173Updated 3 weeks ago
- This is the repo for the paper Shepherd -- A Critic for Language Model Generation☆207Updated last year