bigcode-project / pii-lib

Code for PII detection and redaction in code datasets

☆11

Related projects: ⓘ

swe-bench / experiments
Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.
☆79Updated 2 weeks ago
facebookresearch / cruxeval
CRUXEval: Code Reasoning, Understanding, and Execution Evaluation
☆99Updated last month
theblackcat102 / evol-dataset
evol augment any dataset online
☆55Updated last year
CarperAI / Code-Pile
This repository contains all the code for collecting large scale amounts of code from GitHub.
☆105Updated last year
bigcode-project / starcoder2-self-align
StarCoder2-Instruct: Fully Transparent and Permissive Self-Alignment for Code Generation
☆221Updated 2 months ago
nuprl / MultiPL-E
A multi-programming language benchmark for LLMs
☆189Updated this week
bigcode-project / bigcode-analysis
Repository for analysis and experiments in the BigCode project.
☆113Updated 6 months ago
evalplus / repoqa
RepoQA: Evaluating Long-Context Code Understanding
☆96Updated this week
paul-gauthier / aider-swe-bench
Harness used to benchmark aider against SWE Bench benchmarks
☆44Updated 2 months ago
akjindal53244 / Arithmo
Small and Efficient Mathematical Reasoning LLMs
☆69Updated 7 months ago
nyu-mll / ILF-for-code-generation
☆73Updated last year
shrivastavadisha / repo_level_prompt_generation
☆111Updated last year
openai / human-eval-infilling
Code for the paper "Efficient Training of Language Models to Fill in the Middle"
☆162Updated last year
withmartian / routerbench
The code for the paper ROUTERBENCH: A Benchmark for Multi-LLM Routing System
☆86Updated 3 months ago
ArmelRandy / Self-instruct
A repository to perform self-instruct with a model on HF Hub
☆30Updated 11 months ago
my-other-github-account / llm-humaneval-benchmarks
☆86Updated last year
huu4ontocord / MDEL
Multi-Domain Expert Learning
☆67Updated 7 months ago
teknium1 / LLM-Benchmark-Logs
Just a bunch of benchmark logs for different LLMs
☆112Updated last month
xlang-ai / DS-1000
[ICML 2023] Data and code release for the paper "DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation".
☆211Updated last month
cognitivecomputations / spectrum
☆75Updated 3 weeks ago
automix-llm / automix
Mixing Language Models with Self-Verification and Meta-Verification
☆96Updated 10 months ago
Zyq-scut / RLTF
Accepted by Transactions on Machine Learning Research (TMLR)
☆115Updated 8 months ago
Leolty / repobench
✨ RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems - ICLR 2024
☆129Updated last month
ConsequentAI / fneval
Functional Benchmarks and the Reasoning Gap
☆74Updated last month
abacaj / train-with-fsdp
☆89Updated 11 months ago
nlpxucan / evol-instruct
☆251Updated last year
Mihaiii / llm_steer
Steer LLM outputs towards a certain topic/subject and enhance response capabilities using activation engineering by adding steering vecto…
☆192Updated 4 months ago
agiresearch / Formal-LLM
Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents
☆102Updated 3 months ago
LiveCodeBench / LiveCodeBench
Official repository for the paper "LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code"
☆173Updated 3 weeks ago
facebookresearch / Shepherd
This is the repo for the paper Shepherd -- A Critic for Language Model Generation
☆207Updated last year