openai / code-align-evals-dataLinks
☆28Updated 3 years ago
Alternatives and similar repositories for code-align-evals-data
Users that are interested in code-align-evals-data are comparing it to the libraries listed below
Sorting:
- Code for the paper "Efficient Training of Language Models to Fill in the Middle"☆183Updated 2 years ago
- Code for the TMLR 2023 paper "PPOCoder: Execution-based Code Generation using Deep Reinforcement Learning"☆114Updated last year
- ☆151Updated 4 years ago
- A hard gym for programming☆159Updated last year
- Repository for analysis and experiments in the BigCode project.☆120Updated last year
- Minimal library to train LLMs on TPU in JAX with pjit().☆290Updated last year
- Code accompanying the paper Pretraining Language Models with Human Preferences☆182Updated last year
- ☆238Updated 2 years ago
- Code for paper "LEVER: Learning to Verifiy Language-to-Code Generation with Execution" (ICML'23)☆89Updated 2 years ago
- Code for the curation of The Stack v2 and StarCoder2 training data☆109Updated last year
- Accepted by Transactions on Machine Learning Research (TMLR)☆130Updated 9 months ago
- ☆180Updated 2 years ago
- A set of utilities for running few-shot prompting experiments on large-language models☆122Updated last year
- ☆119Updated last year
- ☆110Updated last year
- Official Repo for ICLR 2024 paper MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback by Xingyao Wang*, Ziha…☆126Updated last year
- A multi-programming language benchmark for LLMs☆260Updated 3 weeks ago
- xCodeEval: A Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and Retrieval☆84Updated 10 months ago
- Open Instruction Generalist is an assistant trained on massive synthetic instructions to perform many millions of tasks☆209Updated last year
- The data processing pipeline for the Koala chatbot language model☆117Updated 2 years ago
- CRUXEval: Code Reasoning, Understanding, and Execution Evaluation☆149Updated 9 months ago
- ☆78Updated 3 months ago
- HellaSwag: Can a Machine _Really_ Finish Your Sentence?☆211Updated 5 years ago
- For experiments involving instruct gpt. Currently used for documenting open research questions.☆71Updated 2 years ago
- a Fine-tuned LLaMA that is Good at Arithmetic Tasks☆178Updated last year
- Simple next-token-prediction for RLHF☆227Updated last year
- Training language models to make programs faster☆91Updated last year
- ToolBench, an evaluation suite for LLM tool manipulation capabilities.☆154Updated last year
- Code and data accompanying our paper on arXiv "Faithful Chain-of-Thought Reasoning".☆161Updated last year
- A (somewhat) minimal library for finetuning language models with PPO on human feedback.☆85Updated 2 years ago