CarperAI / Code-PileLinks

This repository contains all the code for collecting large scale amounts of code from GitHub.

☆110

Alternatives and similar repositories for Code-Pile

Users that are interested in Code-Pile are comparing it to the libraries listed below

Sorting:

EleutherAI / github-downloader
Script for downloading GitHub.
☆96Updated last year
openai / human-eval-infilling
Code for the paper "Efficient Training of Language Models to Fill in the Middle"
☆183Updated 2 years ago
EleutherAI / stackexchange-dataset
Python tools for processing the stackexchange data dumps into a text dataset for Language Models
☆83Updated last year
nyu-mll / ILF-for-code-generation
☆78Updated 4 months ago
zorazrw / odex
[EMNLP'23] Execution-Based Evaluation for Open Domain Code Generation
☆48Updated last year
VITA-Group / ChainCoder
[ICML 2023] "Outline, Then Details: Syntactically Guided Coarse-To-Fine Code Generation", Wenqing Zheng, S P Sharan, Ajay Kumar Jaiswal, …
☆40Updated last year
CarperAI / InstructGPT
For experiments involving instruct gpt. Currently used for documenting open research questions.
☆71Updated 2 years ago
google-research / babelcode
☆52Updated 5 months ago
bigcode-project / bigcode-analysis
Repository for analysis and experiments in the BigCode project.
☆121Updated last year
emrgnt-cmplxty / zero-shot-replication
☆74Updated last year
kernelmachine / cbtm
Code repository for the c-BTM paper
☆107Updated last year
xingyaoww / LeTI
Official repo for NAACL 2024 Findings paper "LeTI: Learning to Generate from Textual Interactions."
☆65Updated 2 years ago
shuyanzhou / docprompting
Data and code for "DocPrompting: Generating Code by Retrieving the Docs" @ICLR 2023
☆248Updated last year
huu4ontocord / MDEL
Multi-Domain Expert Learning
☆67Updated last year
Zyq-scut / RLTF
Accepted by Transactions on Machine Learning Research (TMLR)
☆130Updated 10 months ago
reasoning-machines / prompt-lib
A set of utilities for running few-shot prompting experiments on large-language models
☆122Updated last year
benlipkin / probsem
Probabilistic LLM evaluations. [CogSci2023; ACL2023]
☆73Updated last year
salesforce / CodeGen2
CodeGen2 models for program synthesis
☆272Updated 2 years ago
niansong1996 / lever
Code for paper "LEVER: Learning to Verifiy Language-to-Code Generation with Execution" (ICML'23)
☆89Updated 2 years ago
my-other-github-account / llm-humaneval-benchmarks
☆84Updated 2 years ago
CarperAI / cheese
Used for adaptive human in the loop evaluation of language and embedding models.
☆311Updated 2 years ago
princeton-nlp / intercode
[NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898
☆223Updated last year
HazyResearch / TART
TART: A plug-and-play Transformer module for task-agnostic reasoning
☆200Updated 2 years ago
Eureka6174 / LearnNLPlan
Learning to Program with Natural Language
☆6Updated last year
automix-llm / automix
Mixing Language Models with Self-Verification and Meta-Verification
☆105Updated 7 months ago
moirage / alignment-research-dataset
A dataset of alignment research and code to reproduce it
☆77Updated 2 years ago
EleutherAI / DeeperSpeed
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.
☆168Updated 2 weeks ago
curai / curai-research
☆94Updated 7 months ago
CarperAI / autocrit
A repository for transformer critique learning and generation
☆90Updated last year
dpfried / incoder
Generative model for code infilling and synthesis
☆304Updated last year